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WEIGHTING QUESTIONS IN THE ESSAY-TYPE 
EXAMINATION 


JOHN M. STALNAKER 
College Entrance Examination Board; Princeton University 


Only recently has the teacher become concerned with the problem 
of appropriate weights for examination questions. In the past, he 
graded each of ten questions on a scale of ten points and added the 
scores to obtain the total grade. The questions he believed, if he 
thought of the matter at all, were weighted equally, or had equal 
influence in determining the total score, and in one sense they had. 
Since that time, there has come a change in conditions, for better or 
for worse, so that the teacher can no Jonger feel content in asking but 
ten questions. He has been told that he must secure a wider sampling 
of the student’s knowledge and ability; he must ask more questions. 
In response to this pressure, he divides and redivides his questions; 
his original ten questions are now replaced by twenty, thirty, or one 
hundred items. There comes a point in this dividing process where 
the lack of equality among the items is inescapable. At that point 
arises the problem of weighting the parts in accordance with some 
a priort judgment of their value. 

Appreciable time, and not a little agony, is spent in determining 
the weights, particularly if several teachers are involved. As the 
number of scorable units on an examination increases, the balancing 
of weights becomes highly complex} even for the mathematicians, and 
the whole matter of weighting may be, and frequently is, carried to 
the point of absurdity. 

The teacher who uses the modern technique of reading essay 
questions determines the number of distinguishable levels of answer 
to a question; if six qualities can be differentiated, or six points noted, 
he grades the best or correct response five, the next best four, etc., 


down to zero for the poorest. The problem of equating or weighting 
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the various questions is still to be dealt with. It may be the judgment 
of the teacher or subject-matter expert that one question is much 
more significant than another, even though not so many points of 
differentiation are possible. To compensate for the inequality, multi- 
plying factors are introduced. Should Question one be multiplied by 
two, three, four, or some other number, in order to make it equal to 
Question two? 

The psychologist is no doubt to blame for having brought the 
matter of weighting to the attention of test-builders. He and the 
educational statisticians are responsible for having given out dubious 
information about weighting.! It is the purpose of this paper to look 
into the question of weighting as it affects the ordinary modern essay- 
type examination paper, from both the theoretical and practical points 
of view, and to make recommendations in the light of the findings. 

A study of the grading of examinations and of the interpretation 
of the grades has shown that the frankly comparative methods of 
reporting the results is the most feasible and defensible. Rather than 
maintaining, for example, that certain hurdles must be got over (or 
about) for college admission, the examiner now strives to rank or 
order the candidates according to their performance on the exami- 
nation. The conclusion to be reached from a candidate’s grade on a 
single examination is not that he is or is not ready for college, but 
rather that of the candidates who have gone through certain prescribed 
training, candidate A is very much better than the average in his 
performance; in fact, he is in the upper ten per cent or scores at plus 
1.3 sigma. Candidate B, on the other hand, is slightly below average, 
forty per cent of the candidates getting a lower score, so his score is 





1T. L. Kelley, for example, states in his Statistical Method, page 200, “... 
equally weighted scores are those in which the products of the nominal weights 
and the standard deviations are equal; that is, if wie: = wee. = Wyo3 = - - - ett., 
X:, X2, X3, etc. series of scores are actually weighted equally.’”’ From more 
fundamental considerations regarding the total score, this definition is satisfactory 
only when the series of X’s are mutually equally correlated. H. E. Garrett in 
the second edition of his Statistics in Education and Psychology, pages 187-190, illus- 
trates at length the method of weighting scores according to their variability, 
and states that if we wish to combine sections or tests into a composite or total, 
‘the separate tests should be weighted according to the variability of their scores.” 
The definitions and proofs necessary to support the method he illustrates are not 
explicitly given. From two basic points of view, namely the contribution to total 
variance and the correlation between each part and the total, it can be shown that 
his method of weighting is meaningless. J. P. Guilford, in his Psychometric 
Methods is open to the same criticism. 
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minus 0.1 sigma. The colleges consider this information in con- 
junction with other pertinent data in determining the candidate’s 
fitness. 

The function of the examination thus becomes one of differentiating 
among the candidates, of spreading them out one from another, so 
that it is clear that John Jones knows much more or has more ability 
or aptitude in the field of the examination than the general group. A 
well-constructed examination, it has been said, divides the students 
one from another like the opening out of a fan. The necessary 
) ticketing or classifying of the candidates as elect or non-elect, as pass 
: or fail, as A or F, comes later, and, as has been suggested, is best 
done with additional information. 

/ From the standpoint of present-day examinations, one can no 

longer say that a question which has a maximum credit of twenty 
L points has twice the weight of a question with a maximum of ten 
f points. The influence or weight of an item in the total score is a 
a function of the differentiating power of the question, and of its relation 
r to other questions on the same paper. Perhaps the twenty-point 


T question is very difficult and no candidate scores over eight, and most 
i of them receive less than four, whereas on the ten-point question 
ib most candidates get a score of six with the other grades well scattered 
: throughout the range from zero to ten. The question on which some 


students get low scores, some average, and some high will automatically 


is “count more”’ than the question which is generally answered very well 
1S or one answered very poorly. The question which every candidate 
e, answers perfectly has no effect on spreading the candidates. The 
18 influence of this question on the total score is nil; it merely adds a 
Ay number—say, five or ten—to each score, and as the magnitude of the 
ts total score is a matter of only comparative significance, the question 
C.y is worthless. The question which is answered by no one is equally 
- ineffective. Equating the questions is not the simple matter it has 
ad been assumed to be. : 

no “ Another point to be considered is that the scores on the various 
ty, items on any subject-matter achievement examination are highly inter- 
al, related. If a candidate receives a high score on one-half of the paper, 
oa he should normally be expected to receive a high score on the other 
ni half; if he scores high on question one, he should in general score high 
i on question two. If this situation does not prevail, then a single score 
tric on the examination, intended to describe the student’s ability in the 


field, is of little value, for the score is in such a case more a function 
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of the questions selected than of the candidate’s knowledge of the 
field. The candidate would frequently do very differently on another 
set of questions covering the same general subject. If the scores on 
the questions are highly interrelated, as they usually are in the better 
essay examinations, then weighting becomes an even less important 
issue. When the relationship between the scores on two parts of an 
examination is perfect (i.e. when they are linearly related), the rank 
order of the candidates, be they a million or a hundred million in 
number, will be precisely the same down to the last individual, regard- 
less of how the parts are weighted. The higher the intercorrelations 
among the parts, the less the influence of weights will be. 

The importance of proper weighting in certain cases is not to be 
minimized, nor should one fall into the error of thinking that because 
no specific weighting factors are assigned the items are weighted 
equally. Weighting is indicated in certain cases. It is conceivable, 
for example, though improbable, that one might wish to weight equally 
one section of a test which has a maximum score of two hundred fifty 
and a standard deviation of fifty and another section of the test con- 
sisting of a single long essay question graded on a scale of zero to five 
with a standard deviation of 1.5. To add two such scores without 
the use of weighting factors is a dubious procedure; it is, however, 
almost as questionable to multiply the second score by the large factor 
necessary before adding the two. In this case the entire examination 
set-up should probably be revised. 

It is beyond the scope of this paper to treat the matter of weighting 
in a rigorous mathematical way, but it may be worth while to con- 
sider some of the statistical theory involved.'! It is desirable first to 
define what is meant by ‘‘equal weighting”’ in the testing situation. 
The teacher means, quite clearly, that the maximum possible scores 
on the parts are equal. The statisticians usually mean that the parts 
make equal contribution to the variance (not standard deviation) of 
the total scores.? A third definition, and possibly the most meaning- 





1 Many of the mathematical and statistical questions involved in the problem 
of weighting which have arisen during the preparation of this paper have been 
discussed with S. 8. Wilks, the consulting statistician of the College Entrance 
Examination Board, and his aid has been appreciable. He is now preparing 4 
paper treating the matter mathematically. His paper, ‘‘Optimum Sets of Coeff- 
cients in Linear Functions of Correlated Variables When There Is No Dependent 
Variable,” will be published soon. 

2 The variance is the square of the standard deviation. 
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ful as applied to tests, is that the parts have equal weight when they 
are equally related to the total score which is finally used for ranking 
or grading the candidates. Actually, weighting parts according to 
their standard deviations, the commonly advised method, does not 
necessarily give equal weights according to any of the three definitions, 
except under special circumstances. 

The lack of justification for the use of the first definition (weighting 
according to total possible scores) with the modern method of reporting 
final scores has already been discussed. Consider the second defi- 
nition (equal contribution to total variance), the one usually tacitly 
assumed in weighting tests according to their variability. The vari- 
ance of a sum includes one or more product terms, each containing 
the correlation between two parts and the standard deviations of the 
same two parts. It is not easy to break this down so as to show just 
how much of the term is contributed by each standard deviation. If 
there are only two parts to be weighted and if the weights are equal, 
then weighting according to the standard deviations gives equal con- 
tributions of the two parts to the total variance. If there are several 
—say n—parts and the weights for the parts are ki, ke, ks . . . kn 
(in practice usually small integers), then expressing the scores in 
standard-score form (in terms of standard deviation units), using 
multiplying factors of ki, ke, . . . ka, and adding the results, does not 
by any means give total scores the variance of which is contributed 
to by the parts in proportion to the factors k;, ke, . . . kn. Some- 
what more complex but available methods (almost never used in 
educational testing work) must be employed if it is desired to have 
each part contribute a predetermined amount to the total variance of 
the combined scores. 

The third definition (equally weighted parts are those equally 
related to total score) is one probably implied when teachers speak of 
having one section count n times another in a total score. Actually, 
if converted or transformed scores are used for the final reporting, the 
teacher is interested in the extent to which each section of an exami- 
nation determines the final reported score. If the scores on part I 
are more highly correlated with the total score than are the scores on 
part II, then part I may be said, according to this definition, to 
“weight’’ more heavily in the total score. Statistical means are also 
available for determining appropriate weighting factors to give an 
item or a section a predetermined weight in the total reported score 
according to this definition. 
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Quite apart from which definition is accepted, one may investigate 
the general influence of any set of weights on a total score. In the 
usual testing situation we have a series of correlated item or part 
scores, X1, Xe, X3, . . . Xn, which are combined in some linear fashion 
to get a total score. We might say S = kiX1 + koX2+ ksX3 +--+ - + 
k,Xn, where the weighting factors are expressed by k. It can be shown 
mathematically that as n becomes larger the k’s may take on almost 
any values, within reasonable limits, and the correlation between two 
sets of scores obtained by the use of different values for the k’s becomes 
perfect. The relative order of the individuals tested becomes invari- 
ant for different weights as n becomes large. The more interrelated 
the items are, that is the more narrow and homogeneous the field 
being tested, the smaller n can be and still maintain a given high 
correlation. Generally speaking, when items on a restricted subject- 
matter test number around one hundred, the use of weighting factors, 
regardless of how they are arrived at, is rarely of any practical 
significance. 

In essay tests where the number of scorable units (items) is smaller, 
or where sections, rather than items, are to be combined, weighting 
may have a greater influence. 

The factors which should be used to make each section or item 
contribute a predetermined amount to the total score according to the 
second and third definitions can be mathematically determined, 
although the process is not a single one. Except under special 
conditions, multiplying standard scores by predetermined or a priori 
weights does not accomplish the desired weighting in the final score 
according to either of these fundamental definitions. Before applying 
the elaborate statistical methods necessary to enforce precisely 
arbitrary a priori weights, the question should be raised as to the value 
of applying these weights. The concern should be not so much how 
can arbitrary weights be precisely enforced as, why enforce them. 

Perhaps the most justifiable weights to be developed for use in the 
normal testing situation involve still another definition. The testers 
can use the best weights from the point of view of maximizing stability 
or reliability. No a priori judgment of the value of each part enters. 
Such weights may be determined, but, again, they are not necessarily 
proportional to the total possible scores or to the standard deviations. 
With the routine examinations, the labor involved in computing them 


would be difficult to justify by the slight gain made through their 
use. 
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A study of the results of the reading of the essay papers of the 
College Entrance Examination Board throws some light on the actual 
influence of weights on the total score in the practical examining 
situation. The readers of these examinations, drawn from universities 
and secondary schools and numbering in the hundreds, meet in New 
York City each June to read the essay examinations which have been 
administered by the Board in various centers throughout the world. 
In a number of cases, a group of readers will devote a good deal of 
time to determining the appropriate weights for the questions. The 
discussions are usually based on the implicit idea that the maximum 
possible score is an appropriate index of the weight of the item. The 
usual fallacious assumption is made that a maximum of ten gives a 
question twice the weight of a question which has a maximum of five. 
Yet the Board grades are, and have been for some years, reported on 
a relative scale; therefore, the actual influence of a question on the 
total score is not a function of the maximum score. Sometimes the 
weights are included in the maximum values assigned by the readers; 
a question to be weighted heavily is scored on a larger number of 
points than is a question to be weighted less. Question six, for 
example, is graded on a ten-point sale, not because ten degrees of 
merit can be determined, but because it is judged worth twice question 
one, which has been given five points. Other readers grade the 
papers acording to the degrees of differentiation possible, but weight 
the part scores by the use of multiplying factors. 

One examination of the College Board in which explicit weighting 
factors were used in June 1937 was Mathematics A, elementary algebra. 
Twelve questions were required of the candidates; but several questions 
were broken into subparts, so that nineteen separate scores were 
assigned to each paper. The maximum scores for the parts varied 
from two to six points. Five of the part scores were added as they 
were assigned (that is, the weighting factor was one); five more were 
multiplied by two, eight by three, and the remaining part by four. 
The examination was read with a reliability of .97 as determined by 
independent rereading of a sample of the papers. 

The most direct method of determining the actual influence of 
weights is to find the relationship between the weighted and the 
unweighted scores. Accordingly, on a sample of one hundred papers 
selected at random, the unweighted total scores were calculated (that 
is, the marks assigned were merely added without the use of the 
multiplying factors), and the correlation computed. The correlation 
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between the weighted and the unweighted scores on Mathematics A 
is .99. There is a closer relationship between the weighted and the 
unweighted scores than there is between two independent readings 
of the same paper. 

A similar study was made of the influence of weights on each of 
five other examinations given by the Board in the various branches 
of mathematics.' The results all point in the same direction. In 
only one case, that of Mathematics E, was the relationship between 
the weighted and the unweighted scores lower than .99, and there it 
was somewhat over .98. Correlations of .99 indicate a practically 
linear relationship between the two scores; that is, with the Board’s 
system, no matter which set of scores is used, the candidates will 
receive essentially the same reported scores. 

Because it may be felt that the six mathematics examinations, 
each of which contained eleven or more scorable units, deal with 
highly restricted and homogeneous material, and therefore give 
atypical results, a study wasmadeof other examinations. The English 
examination was divided into six required questions, with maximum 
scores of 24, 8, 6, 16, 12, and 12. The readers desired to weight the 
questions so that the final maximum scores would be the following 
percentages of the total score: 20, 12, 12, 15, 7, and 34; that is, questions 
one and six were judged to be of much greater significance than any 
other question. The multiplying factors necessary to achieve approxi- 
mately the desired weights are: 14, 24, 32, 15, 9, and 34; certainly 
these are no simple weighting factors. In spite of the complexity, a 
sample of one hundred books was taken and a total weighted score 
computed on this basis. At the same time another score was com- 
puted by using the simpler factors of 1, 2, 2, 1, 1, and 3. When the 
correlation between the two scores was found to be .997, the simpler 
weighting system was used. After the reading period, the correlation 
between the scores weighted by the simpler system and the same 
scores merely added without the use of any multiplying factors was 
found to be .97. The reliability of reading the entire paper was .84. 





1 Mathematics B, with sixteen scorable units, the maximum scores on which 
varied from four to seven, used weighting factors of 1 and 2; Mathematics C, with 
sixteen scorable units the maximum scores on which varied from one to eight, used 
weighting factors of 1 and 2; Mathematics CD, with seventeen scorable units with 
maximum scores of one to ten, used weighting factors of 1, 2, and 3; Mathematics D, 
with eleven scorable units with maximum scores ranging from two to ten, used 
weighting factors of 1, 2, and 3; and Mathematics EZ, with fourteen scorable units 
with maximum scores ranging from two to seven, used weights of 1, 2, and 3. 
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The paper in American History (History D) was also made up of 
six questions which the readers desired to weight equally, that is, to 
have the same maximum credits. The questions were read with the 
following maximum values: 18, 60, 10, 19, 30, and 30; the multiplying 
factors 3, 1, 6, 3, 2, and 2, were used to make these values equal. 
Yet even in this extreme case of six questions and weights varying 
from one to six, the correlation between the weighted and the 
unweighted scores was found to be over .97. 

The Chemistry group determined, on an a priori basis, values to 
be assigned to the various parts of their paper. The first part, 
required of all candidates, consisted of five questions, each broken 
into subparts and even smaller divisions, so that twenty-six different 
units were independently evaluated. The second part of the paper, 
offering the candidate a choice of any four of six given questions, was 
divided into thirty-eight scorable units. Although each question on 
the second part did not have the same number of subparts, the maxi- 
mum credit for each question totaled to twenty-four points. The 
usual assumption was made, incorrectly, that the various optional 
questions were being weighted equally because the total possible 
scores were equal. The score on the entire Chemistry paper was 
obtained by adding the scores given the parts; no multiplying factors 
were used since the values or maximum credits allowed for each 
scorable unit had already been balanced according to what was felt 
to be the appropriate weighting. What effect would different values 
have in this case? On arandom sample of one hundred papers a new 
score was obtained on the required part by using purely arbitrary and 
indefensible multiplying weighting factors. The score for item la 
was multiplied by 1, that for item 1b by 2, that for lc by 3, for ld by 
1, the next by 2, the next by 3, etc.; in other words, each item score 
was multiplied by 1, 2, or 3, according to its position in the paper. 
The total weighted score for the required part of the paper thus 
obtained correlated with the score for the same questions obtained 
without these weights, .99. This correlation indicates that even if 
quite different weights had been used, whether arrived at through 
serious discussion or by the patently nonsensical means used here, the 
results would have been almost exactly the same. 

It seems unnecessary to carry the study further. A comparable 
treatment of the papers in physics and biology and the languages 
would yield similar results, one may be reasonably certain. The 
relationship between the weighted and unweighted scores is so high, 
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so nearly perfect, that there is little justification for the use of weights 
with these examinations. Whether the scores are weighted or not, 
the grade reported to the candidate will be essentially the same. If, 
however, it is desirable for any reason to weight or equate various 
sections or items of an examination according to a priori judgment, 
one should first consider what definition of weighting agrees most 
nearly with what one hasin mind. An appropriate statistical method 
may then be selected to accomplish the desired result. Probably the 
most justifiable system of weights to be developed is one which maxi- 
mizes the reliability or consistency of the examination and which is 
independent of a priori judgments of the relative values of the 
questions. In practically all cases the statistical method of deter- 
mining weights involves not only the use of factors decided upon in 
advance but also the use of factors which are functions of indices of 
spread of the part scores and of the coefficients of correlation among 
the parts. Because of the complexity of these methods and of their 
small net effect, their use is seldom indicated in the typical test 
situation. The present practices in regard to weighting sections in 
tests are theoretically difficult to justify. 

In the ordinary subject-matter achievement examination which 
deals with a single field, and in which a number of part scores are 
assigned, the simplest and wisest rule for the readers to follow is to 
assign the maximum values for the parts according to the number of 
degrees of difference in the answers which can be consistently judged. 
The total score may then be obtained by merely adding the assigned 
‘part scores. The influence of the usual weighting factors is so small 
as to be insignificant. 
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STUDENTS’ TABLES OF THE UNIT NORMAL CURVE, 
FOR ABSCISSAE EXPRESSED IN TERMS OF THE 
PROBABLE ERROR OR PE: I. AREAS CORRE- 
SPONDING TO ABSCISSAE. II. ABSCISSAE 
CORRESPONDING TO AREAS 


HERBERT 8. CONRAD AND RUTH H. KRAUSE* 
Institute of Child Welfare, University of California 


Textbooks of statistics in psychology and education generally 
include two tables of areas under the unit normal curve. One of these 
tables presents areas corresponding to abscissae expressed in terms of ¢, 
the other presents areas corresponding to abscissae expressed in terms 
of the probable error or PE. The second of these tables, however, is 
in all texts much less complete than the first; thus, in the text by 
Garrett, Table 15 (giving areas for o-deviations) contains three 
hundred thirty-one entries; but Table 16 (giving areas for PE-devia- 
tions) contains only one hundred twenty entries. Characteristically, 
too, the tables of areas for abscissae expressed in terms of PE are not 
altogether free from error.t The origin of both these shortcomings 
appears to rest in Rugg’s 1917 table.’ This table (one of the least 
accurate of those published on the normal curve”) has been widely 
copied by writers of statistical texts in psychology and education. 

Table I of the present paper states the percentage of cases found 
in a normal curve between the mean and a given deviation expressed 
in terms of PE. The number of entries in the table ‘is five hundred 
fifty-one; and the entries may be accepted as correct to the last decimal 
presented.{t Although most tables of the normal curve have recorded 





* The writers are indebted to Professors H. D. Carter and R. C. Tryon for 
reading and criticism of the manuscript. Acknowledgment is also made to Mr. 
Louis Chan and the National Youth Administration of the University: of California 
for intelligent and efficient assistance inthe mathematical calculations. 

t An exception is Garrett’s short table in the revised edition of his well-known 
text,*™"41 and Holzinger’s still shorter table.5”*"" Reference 10 contains a 
section listing errors in previously published tables of areas under the unit normal | 
curve, for abscissae expressed in terms of PE. 

} The great majority of entries were calculated correct to at least four figures 
beyond the number given in Table I. Every entry has been calculated inde- 
pendently at least twice. Details concerning the methods of calculation and check 
employed, may be found in a forthcoming technical paper by the writers (R. H. 
Krause and H. 8. Conrad), 
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areas, each entry of Table I represents a percentage of the total number 
of cases. The majority of persons using a table of the normal curve 
think in terms of percentage, and there is no good reason why this 
habit of thought should not be respected. 


TaBLeE I.—PERCENTAGE OF CASES BETWEEN MEAN AND DEVIATION IN A NORMAL 
CurRvVE 
(Deviations Expressed in Terms of PE) 





















































Devi- 
— 00 01 02 03 04 05 06 07 08 09 
(z/PE) 
0 00 27° BA 81 1.08 1.35— | 1.61 1.88 2.15+ | 2.42 
"1 | 2/69 2.96 3.23 3.49 3.76 4.03 4 30 4.56 4.83 5.10 
"2 | 5.37 5.63 5.90 6.16 6.43 6.70 6.96 7.23 7.49 7.754 
"3 | 8.02 8.28 8 54 8.81 9.07 9.33 9.59 985+ | 10.11 | 10.37 
"4 | 10.63 | 10.89 | 11.15+] 11.41 | 11.67 | 11.93 | 12.18 | 12.44 | 12.69 | 12.95- 
"5 | 13.20 | 13.46 | 13.71 | 13.96 | 14.22 | 14.47 | 14.72 | 14.97 | 15.22 | 15.47 
"6 | 15.71 | 15.96 | 16.21 | 16.46 | 16.70 | 16.95—| 17.19 | 17.43 | 17.68 | 17.92 
"7 | 18.16 | 18.40 | 18.64 |1888 | 19.12 | 19.35+ 119.59 | 19.82 | 2006 | 20.29 
"g | 20.53 | 20.76 | 20.99 | 21.22 | 21.45— | 21.68 | 21.91 | 2213 | 22.36 | 22.68 
"9 | 22.81 | 23.03 | 23.254+|23.48 | 23.70 | 23.92 | 24.13 | 24.354 | 24.57 | 24.79 
1.0 | 25.00 | 25.21 | 25.43 | 25.64 | 25.85— | 26.06 | 26.27 | 26.48 | 26.68 | 26.89 
1.1 | 27.09 | 27.30 | 27.50+ | 27.70 | 27.90 | 28.10 | 28.30 | 28.50— | 28.70 | 28.89 
12 | 29.09 | 29.28 | 2947 | 29.66 | 2985+ | 30.04 | 30.23 | 30.42 | 30.60 | 30.79 
13 | 30.97 | 31.15+ | 31.34 | 31.52 | 31.70 | 31.87 | 32.05+ | 32.23 | 32.40 | 32.58 
1.4 | 32.75— | 32.92 | 33.00 | 33.26 | 33.43 | 33.60 | 33.76 | 33.93 | 34.09 | 34.25+ 
15 | 34.42 | 34.58 | 34.74 | 34.90 | 35.05+ | 35.21 | 35.36 | 35.52 | 35.67 | 35.82 
1.6 | 35.97 | 36.12 | 36.27 | 36.42 | 36.57 | 36.71 | 36.86 | 37.00 | 37.14 | 37.28 
1.7 | 37.42 | 37.56 | 37.70 | 37.84 |37.97 | 38.11 | 38.24 | 38.37 | 38.50+ | 38.63 
1.8 | 38.76 | 38.89 | 39.02 | 39.15— | 39.27 | 39.39 | 39.52 | 39.64 | 39.76 | 39.88 
1.9 | 40.00 | 40.12 | 40.23 | 40.35+ | 40.46 | 40.58 | 40.69 | 40.80 | 40.91 | 41.02 
2.0 | 41.13 | 41.24 | 41.35— | 41.45+ | 41.56 | 41.66 | 41.77. | 41.87 | 41.97 | 42.07 
21 |42.17 | 42.27 | 42.36 | 42.46 | 42.554 | 42.65— | 42.74 | 42.84 | 42.93 | 43.02 
2.2 | 43.11 | 43.20 | 43.29 | 43.37 | 43.46 | 43.54 | 43.63 | 43.71 | 43.80 | 43.88 
23 | 43.96 | 44.04 | 44.12 | 44.20 | 44.28 | 44.35+ | 44.43 | 4450+ | 44.58 | 4465+ 
24 | 44.73 | 44.80 | 44.87 | 44.94 | 45.01 | 45.08 | 45.15— | 45.21 | 45.28 | 45.35— 
25 | 45.41 | 45.48 | 45.54 | 45.60 | 45.67 | 45.73 | 45.79 | 45.85— | 45.91 | 45.97 
26 | 46.03 | 46.08 | 46.14 | 46.20 | 4625+ | 46.31 | 46.36 | 4641 | 46.47 | 46.52 
2.7 | 46.57 | 46.62 | 46.67 | 46.72 | 46.77 | 46.82 | 46.87 | 46.91 | 46.96 | 47.01 
2.8 | 47.05+ | 47.10 | 47.14 | 47.19 | 47.23 | 47.27 | 47.31 | 47.36 | 47.40 | 47.44 
29 | 47.48 | 47.52 | 47.56 | 47.59 | 47.63 | 47.67 | 47.71 | 47.74 | 47.78 | 47.81 
3.0 | 47.85—| 47.88 | 47.92 | 47.95+ | 47.98 | 48.02 | 48.05—| 48.08 | 48.11 | 48.14 
3.1 | 48.17 | 48.20 | 48.23 | 48.26 | 48.29 | 48.32 | 48.35— | 48.37 | 48.40 | 48.43 
3.2 | 48.46 | 48.48 | 48.51 | 48.53 | 48.56 | 48.58 | 48.61 | 48.63 | 48.65+ | 48.68 
3.3 | 48.70 | 48.72 | 48.74 | 48.76 | 48.79 | 48.81 | 48.83 | 48.85— | 48.87 | 48.89 
3.4 | 48.91 | 48.93 | 48.95—| 48.97 | 48.98 | 49.00 | 49.02 | 49.04 | 49.054 | 49.07 
3.5 | 49.09 | 49.10 | 49.12 | 49.14 | 49.15+ | 49.17 | 49.18 | 49.20 | 49.21 | 49.23 
3.6 | 49.24 | 49.26 | 49.27 | 49.28 | 49.30 | 49.31 | 49.32 | 49.33 | 49.35— | 49.36 
3.7 | 49.37 | 49.38 | 49.39 | 49.41 | 49.42 | 49.43 | 49.44 | 49.454 | 49.46 | 49.47 
3.8 | 49.48 | 49.49 | 49.50+ | 49.51 | 49.52 | 49.53 | 49.54 | 49.55—| 49.56 | 49.57 
3.9 | 49.57 | 49.58 | 49.59 | 49.60 | 49.61 | 49.61 | 49.62 | 49.63 | 49.64 | 49.64 
4.0 | 49.65+ | 49.66 | 49.67 | 49.67 | 49.68 | 49.68 | 49.69 | 49.70 | 49.70 | 49.71 
4.1 | 49.72 | 49.72 | 49.73 | 49.73 | 49.74 | 49.74 | 49.75— | 49.754 | 49.76 | 49.76 
4.2 | 49.77 | 49.77 | 49.78 | 49.78 | 49.79 | 49.79 | 49.80 | 49.80 | 49.81 | 49.81 
4.3 | 49.81 | 49.82 | 49.82 | 49.83 | 49.83 | 49.83 | 49.84 | 49.84 | 49.84 | 49.85— 
4.4 | 49.85+ | 49.85+ | 49.86 | 49.86 | 49.86 | 49.87 | 49.87 | 49.87 | 49.87 | 49.88 
4.5 | 49.88 | 49.88 | 49.89 | 49.89 | 49.89 | 49.89 | 49.80 | 49.897 | 49.900 | 49.902 
4.6 | 49.904 | 49.906 | 49.908 | 49.910 | 49.912 | 49.914 | 49.916 | 49.918 | 49.920 | 49.922 
4.7 | 49.924 | 49.926 | 49.927 | 49.929 | 49.931 | 49.932 | 49.934 | 49.935+] 49.937 | 49.938 
4.8 | 49.940 | 49.941 | 49.943 | 49.944 | 49.945+| 49.946 | 49.948 | 49.949 | 49.950+/ 49.951 
4.9 | 49.953 | 49.954 | 49.955—| 49.956 | 49.957 | 49.958 | 49.959 | 49.960 | 49.961 | 49.962 





between the mean 


* This entry (as an illustration) is to be read: When the deviation (or abscissa) equals .01 PE, the percentage of cases 
and this deviation is .27. 
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TaBLe I.—Continued 
(Deviations from 5.0 to 10.0 PE) 














Deviation or Percentage of Deviation or DA IED i a 
abscissa (z/PE) of cases abscissa (z/PE) a6 

5.0 49 .963* 7.5 49.999 979 
5.1 49.971 7.6 49.999 985+ 
5.2 49.977 Pe 49.999 989 7 
5.3 49 .982 7.8 49.999 992 8 
5.4 49 .986 \ 7.9 49.999 995 0+ 
5.5 49.989 6 8.0 49.999 996 6 
5.6 49.992 1 8.1 49.999 997 7 
5.7 49.994 0 8.2 49.999 998 4 
5.8 49.995 4 8.3 49.999 998 9 
5.9 49.996 5+ 8.4 49.999 999 27 
6.0 49.997 4 8.5 49.999 999 51 
6.1 49.998 1 8.6 49.999 999 67 
6.2 49.998 6 8.7 49.999 999 78 
6.3 49.998 9 8.8 49.999 999 85+ 
6.4 49.999 21 8.9 49.999 999 903 
6.5 49.999 42 9.0 49.999 999 936 
6.6 49.999 57 9.1 49.999 999 958 
6.7 49.999 69 9.2 49.999 999 973 
6.8 49.999 77 9.3 49.999 999 982 
6.9 49.999 84 9.4 49.999 999 989 
7.0 49.999 88 9.5 49.999 999 992 6 
7.1 49.999 916 \ 9.6 49.999 999 995 3 
73 49.999 940 9.7 49.999 999 997 0 
7.3 49.999 958 9.8 49.999 999 998 1 
7.4 49.999 970 9.9 49.999 999 998 8 
7.5 49.999 979 10.0 49.999 999 999 23 














* This entry (as an illustration) is to be read: When the deviation (or abscissa) 


equals 5.0 PE, the percentage of cases between the mean and this deviation is 
49.963. 


Table I is sufficiently complete to meet all ordinary demands of 
elementary statistical work; a shorter table would fail to give accurate 
answers without interpolation. To prevent error in the dropping of 
decimals (by those who may require fewer figures than given in Table I), 
entries with a terminal ‘‘5” (or “50”) have been suffixed by a ‘‘+” 
or a ‘‘—”’: thus, in the first line of the table, we find that the percentage 
of cases corresponding to a deviation of .08 PZ, is 2.15+ ; correct to one 
decimal, this should be read as 2.2 per cent. Analogously, in the fifth 
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line of the table, we find that the percentage of cases corresponding to a 
deviation of .49 PE is 12.95—; correct to one decimal, this should be 
read as 12.9 per cent. 

The second page of Table I gives percentages corresponding to 
deviations from 5.0 to 10.0 PE. The figures in this section have been 
included partly for their occasional usefulness, and partly to combat 
the notion (too often entertained by those who have learned their 
statistics ‘‘on the run’’) that the normal curve really ends at +4 PE. 

Problems in elementary statistics not infrequently require the 
reading, from a table, of the normal-curve deviation corresponding to 
a given percentage of cases above (or below) the mean: this is, of 
course, simply the inverse of reading the percentage of cases corre- 
sponding to a given deviation. For such inverse problems, it is con- 
venient (in order to eliminate the need for interpolation) to have a 
separate, inverse table. The Kelley-Wood,* Kondo-Elderton,’ and 
Conrad-Krause' tables fill this need, for deviations expressed in terms 
of the standard deviation; and tables in which the deviations are 
expressed in terms of PE have been published by Holzinger*® and by 
Dunlap and Kurtz.?* Restricting attention to the tables of PE-devia- 
tions by Holzinger and by Dunlap and Kurtz, it is of interest that 
neither of these tables extends beyond 49.9 per cent of the total cases; 
and that the fixed interval of these tables (namely, .1 per cent) may 
in the upper region of the tables be considered relatively coarse (in the 
sense that use of the tables in this region is likely to require inter- 
polation). Further, both these tables present a larger number of 
decimals than is convenient for ordinary use; and neither of the tables 
is, in every instance, accurate to the last decimal presented. f 

Table II of the present paper states the PE-deviation corresponding 
to a given percentage of cases above (or below) the mean of a normal 
curve. The PE-deviations are given uniformly to two decimals; as 
in the case of Table I, each terminal “5” (or ‘‘50”) has been suffixed 





*Since the present paper was written, another such table has been published 
by Krause and Conrad.}% 

t In Holzinger’s table,‘ the value of z/PE is too small by one in the last (fourth) 
decimal for the areas .032, .055, and .476, respectively; and too large by one in the 
last decimal for the area .009. In the table by Dunlap and Kurtz,? the value of 
z/PE is too small by one in the last (fifth) decimal for the areas .072, .083, .084, 
.118, .125, .216, .240, .388, .408, .428, .439, and .499, respectively; and too large 
by one in the last decimal for the areas .064, .106, .169, .201, .451, .489, and .493, 
respectively. (For details concerning the calculations and checks on which these 
corrections are based, cf. the technical paper mentioned in the preceding footnote.) 
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by a “+” or a ‘‘—”’ sign, to prevent error in the dropping of decimals. 
The table extends to a percentage of cases which is virtually as high 
as the final entry in the companion Table I (viz., to 49.999 999 999 per 
cent). Above the percentage 45.00, the interval of Table II is .01 per 
cent, up to the percentage 49.99; beyond 49.99, the interval becomes 
smaller, by stages, until in the last line of the table it is .000 000 001 
per cent. Unless exceptional accuracy is required, Table II is suffi- 
ciently complete to meet all ordinary demands of elementary statistical 
work without the use of interpolation. The number of entries in 
Table II is ten hundred thirteen; and these entries may be accepted, 
with confidence, as correct to the number of decimals presented.* 

The purpose of Tables I and II is to permit students to solve prob- 
lems relating to the normal curve with the same ease and accuracy 
when using PE-deviations, as when using o-deviations. The utility 
and convenience of a lithoprinted edition of Table I, in connection 
with several common texts in statistics, has been tested in the last 
two years by students at the University of California and the Uni- 
versity of Oregon. Table II has not yet been subjected to such 
practical trial. 

A word may be said, finally, concerning the relative advantages of 
PE-units versus o-units. There is no doubt in our minds that for 
elementary students—and relatively few persons advance beyond the 
elementary stage in statistics—the probable error or PE represents a 
more easily grasped concept than the standard deviation oro. The 
area in a normal curve between the mean and +1 PE includes exactly 
fifty per cent of the cases—a round number, with obvious significance. 
The area between the mean and +1 o includes 68.26 per cent of the 
cases—and it is not easy for many persons to see why the deviation 
corresponding to such a peculiar percentage should be given cardinal 
significance. In this connection, it is interesting to observe that tables 
of the normal curve for abscissae in terms of PE were early constructed 
by such men as Gauss,!+"?-59-69 Encke,? and W. W. Johnson’; these 
tables far antedate the table by Sheppard" using o as the unit. From 
the practical computational standpoint it is of course true that the 
probable error is obtained from the standard deviation; so that the use 
of the probable error might be thought to involve oneextra computa- 
tional step. But this is not the case for many of the more common 
statistical calculations, if use is made of the well-known table of 





* The great majority of the entries in Table II were calculated correct to four 
or five figures beyond the number given in the table. 


496 


The Journal of Educational Psychology 





TaBLeE II.—DeEviaTION CORRESPONDING TO PERCENTAGE OF CASES ABOVE 


(OR BELOW) THE MEAN oF A NoRMAL CURVE 
(Deviations Given in Terms of PE) 






























































Percentage 

of cases 

ores st 2 3 4 7 8 9 
mean 
0 00 | .00 | .o1 | .o1 | .01 03 | .03 | .03 
1 04* | .04 | .04 | .05—| .05+ 06 | .07 | .07 
2 07 | .08 | .08 | .09 | .09 Ditties 2 
3 Sct eS eS Sh ae aes 
4 15—| .15+| .16 | .16 | .16 1s | .18 | 118 
5 9 | .19 | .19 | .20 | .20 21 | .22 | «22 
6 22 | .23 | .23 | .24 | .24 .25+| .25+| .26 
7 2 | .27 | .27 | .27 | .28 29 | .29 | .30 
* ie Le ce ee a+ 2) 2 
9 4 | .84 | .84 | .86-| .364/ . 6 | .87 | .87 
10 38 | .38 38 | .39 | .39 40 | .41 | 041 
11 41 | .42 | .42 | .43 | .48 44 | .45-| .45- 
12 45+| .46 | .46 | .46 | .47 48 | .48 | .49 
13 49 | .50—| .50-| .50+| .51 62 | .52 | .83 
14 53 | .54 | .64 | .54 | .55- 56 | .56 | .57 
15 67 | .58 | .58 | .58 | .59 .60 | .60 | .61 
16 61 | .62 | .62 | .62 | .63 64 | .64 | .65- 
17 .65+| .66 | .66 | .66 | .67 .68 | .69 | .69 
18 1. i Mim 1. 73 | .%8 | .7 
19 1% | .74 | .74 | .76—| .76+ 76) 7 | oa 
20 m1 | 1 2] oe 81 | .81 | .82 
21 Te ae coe woe 85+) .86 | .86 
22 86 | .87 | .87 | .88 | .88 90 | .90 | .90 
23 o1 | .91 | .92 | .92 | .98 | 94 | .94 | .95- 
24 .95+| .96 | .96 | .97 | .97 | . 99 | .99 | 1.00 
25 1.00 | 1.00 | 1.01 | 1.01 | 1.02 | 1. 1.03 | 1.04 | 1.04 
26 =| 1.05—| 1.05+| 1.06 | 1.06 | 1.07 | 1. 1.08 | 1.09 | 1.09 
27 1.10 | 1.10 | 1.11 | 1.11 | 1.12 | 1. 1.13 | 1.13 | 1.14 
28 1.14 | 1.15—! 1.154) 1.16 | 1.16 | 1. 1.18 | 1.19 | 1.19 
29 1.20 | 1.20 | 1.21 | 1.21 | 1.22 | 1. 1.23 | 1.24 | 1.24 
30 1.25—| 1.254| 1.26 | 1.26 | 1.27 | 1. 1.29 | 1.29 | 1.30 
31 1.30 | 1.31 | 1.31 | 1.32 | 1.32 | 1. 1.34 | 1.35—| 1.35+ 
32 1.36 | 1.36 | 1.37 | 1.37 | 1.38 | 1. 1.40 | 1.40 | 1.41 
33 1.41 | 1.42 | 1.43 | 1.43 | 1.44 | 1. 1.46 | 1.46 | 1.47 
34 1.47 | 1.48 | 1.49 | 1.49 | 1.50-/ 1. 1.52 | 1.52 | 1.53 
35 1.54 | 1.54 | 1.55-| 1.56 | 1.56 | 1. 1.58 | 1.59 | 1.60 
36 1.60 | 1.61 | 1.62 | 1.62 | 1.63 | 1. 1.65—| 1.66 | 1.66 
37 1.67 | 1.68 | 1.68 | 1.69 | 1.70 | 1. 1.72 | 1.73 | 1.73 
38 1.74 | 1.75-| 1.76 | 1.76 | 1.77 | 1. 1.80 | 1.80 | 1.81 
39 1.82 | 1.83 | 1.83 | 1.84 | 1.854] 1. 1.87 | 1.88 | 1.89 
40 1.90 | 1.91 | 1.92 | 1.93 | 1.93 | 1.94 1.96 | 1.97 | 1.98 
41 1.99 | 2.00 | 2.01 | 2.02 | 2.02 | 2.03 2.05+| 2.06 | 2.07 
42 2.08 | 2.09 | 2.10 | 2.11 | 2.12 | 2.13 2.16 | 2.17 | 2.18 
43 2.19 | 2.20 | 2.21 | 2.22 | 2.23 | 2.24 2.27 | 2.28 | 2.29 
44 2.31 | 2.32 | 2.33 | 2.34 | 2.36 | 2.37 2.40 | 2.41 | 2.42 





* This entry (as an illustration) is to be read: When the percentage of cases from the mean of 8 
normal curve equals 1.0, the deviation on the X-axis is .04 PZ. 
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TaBLE II.—Continued 
(Percentages from 49.990 on; Deviations Given in Terms of PE) 



































Percentage of Deviation or Percentage of Deviation or 
cases from the ; cases from the : 
abscissa (x/PE) abscissa (2/PE) 
mean mean 
49.990 5.51* 49.999 995 7.90 
.991 5.55+ .999 996 7.96 
.992 5.60 999 997 8.03 
. 993 5.65 — .999 998 8.14 
. 994 5.70 49.999 999 8.32 
.995 5.77 49.999 999 0 8.32 
.996 5.85 — .999 999 1 8.35— 
.997 5.95 — .999 999 2 8.38 
.998 6.09 .999 999 3 8.41 
49.999 6.32 .999 999 4 8.45+ 
49.999 0 6.32 .999 999 5 8.50 — 
.999 1 6.36 .999 999 6 8.55+ 
.999 2 6.40 .999 999 7 8.62 
.999 3 6.44 .999 999 8 8.72 
.999 4 6.49 49.999 999 9 8.89 
.999 5 6.55 — 49.999 999 90 8.89 
.999 6 6.62 .999 999 91 8.92 
.999 7 6.71 .999 999 92 8.95 — 
.999 8 6.84 .999 999 93 8.98 
49.999 9 7.05— .999 999 94 9.01 
49.999 90 7.05 — .999 999 95 9.06 
.999 91 7.08 .999 999 96 9.11 
.999 92 7.11 .999 999 97 9.18 
.999 93 7.15+ .999 999 98 9.27 
.999 94 7.20 49.999 999 99 9.43 
.999 95 7.25+ 49.999 999 990 9.43 
.999 96 7.32 .999 999 991 9.46 
.999 97 7.40 .999 999 992 9.48 
.999 98 7.52 .999 999 993 9.51 
49.999 99 7.71 .999 999 994 9.55— 
49.999 990 7.71 .999 999 995 9.59 
.999 991 7.74 .999 999 996 9.64 
.999 992 7.77 .999 999 997 9.70 
.999 993 7.81 .999 999 998 9.79 
49.999 994 7.85— 49.999 999 999 9.94 





* This entry (as an illustration) is to be read: When the percentage of cases from 
the mean of a normal curve equals 49.990, the deviation on the X-axis is 5.51 PE. 





from 
PE. 
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x1 (published both in Pearson’s"' and Holzinger’s® Tables); moreover, 
one may without computation obtain the PE’s corresponding to a 
wide range of o’s, merely by reference to the table by Dunlap and 
Kurtz?”-® prepared specifically for this purpose. Authors of statis- 
tical texts should make these tables available to their readers. In 
the writers’ judgment, even if the use of the PE-unit occasionally 
involved some slight additional computation, this would be more than 
compensated by the greater comprehensibility and meaningfulness of 
the PE- over the o-unit for the majority of beginners in statistics. 


SUMMARY 


Two tables of the unit normal curve are presented. Table I 
states the percentage of cases found in a normal curve between the 
mean and a given deviation expressed in terms of the probable error 
or PE. Table II, the inverse of Table I, states thePH-deviation 
corresponding to a given percentage of cases above (or below) the 
mean of a normal curve. Both Tables I and II are accurate to the 
last decimal presented, and contain a considerably larger number of 
entries than previous tables of their type. Unless exceptional accuracy 
is required, the tables may be applied without need of interpolation. 
The purpose of Tables I and II is to permit easy and rapid use of the 
normal curve in the solution of problems involving deviations expressed 
in terms of PE. It is urged that, for most beginners in statistics, the 
probable error or PE represents a more easily grasped and more 
meaningful concept than the standard deviation, and hence deserves 
preference over the latter in most statistical texts. 
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THE STABILITY OF NEW-TYPE QUESTIONS* 


DOROTHY M. ANDREW 


Pennsylvania College for Women 
AND 
CHARLES BIRD 


University of Minnesota 


Of importance to educators who use new-type tests for measuring 
academic achievement is a knowledge of the degree to which questions 
will continue to differentiate students with various letter grades in 
examinations. No longer can we assume that examination items are 
valid and stable because they command intuitive acceptance. In 
the construction of objective tests, care and interest are inadequate to 
achieve discriminating instruments unless supplemented by item 
analyses.'* After having achieved items which are satisfactory in 
an examination, an instructor does not know if they will remain stable 
upon repetition with the same group of students or with another 
group selected at a later date. Assertions have been made that 
examinations ‘‘get out” or that items are remembered by students to 
be filed for parasitic associates, the implication being that new-type 
questions should not be repeated. If these assertions are true, they 
deserve serious consideration. But mere opinions cannot settle the 
question of the discriminating capacity of items upon repetition. No 
instructor, of course, will desire to make tests available through any 
avenues until he has accumulated a number of items sufficiently large 
to deter a student from studying tests as a substitute for studying his 
courses. Yet an instructor can hardly afford to forego the repetition 
of items in succeeding examinations because of paranoic fears. The 
construction of new-type tests originally is time-consuming.? Only 
when items of marked validity can be utilized, can the instructor save 
time with new-type tests. And, only as these items withstand the 
many conditions potentially capable of ruining their validity, do they 
deserve a place in new examinations. 

As has been reported,’ a large number of objective questions have 
been validated in the General Psychology course at the University of 
Minnesota. Of the forty-two hundred two items in the files at the 





* The writers acknowledge their indebtedness to the National Youth Adminis- 
tration for the services of Federal Aid Students in validating examinations given 
in the Department of Psychology at the University of Minnesota. 
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close of the academic year 1936, twenty-two hundred sixty-three had 
been analyzed. At this date only nine hundred twenty-seven of the 
validated items had been used more than once. Concern over the 
possibility of items being remembered and recorded by students for 
future reference, and, in spite of extraordinary safeguards, fear that 
examination copies had become circumscribed public property, are 
responsible for questions of known differentiating power not having 
been repeated. The number of validated questions repeated, never- 
theless, is large enough to justify comparisons among the four types of 
items used and to permit interpretations designed to dispel timidities 
standing in the way of utilizing again the best of these questions. 

Certain conditions underlying the construction and validation of 
these items recommend inquiry into their stability. Usually, each 
examination has been taken by approximately five hundred students. 
A large number of students is necessary to justify confidence in item 
analyses, particularly when the validation is expressed as percentages 
of students who pass each item and who are assigned letter grades 
ranging from A to F on the examination. Validity means the degree 
to which each item differentiates students in the same manner as the 
total examination score. Unless large classes are available, the pre- 
vailing academic practice of assigning grades according to a five-fold 
division results in too few students representing the extremes of the 
grading system. It is also important to measures of stability that 
approximately the same grading procedure should be maintained. 
This also has been achieved. From time to time, distributions of 
scores have been remarkably constant, ranges have been wide, curves 
have been normal, and, from examination to examination, the same 
percentages of letter-grades have been allotted. Finally, the spacing 
of examinations has been uniform. In both Psychology 1 and Psy- 
chology 2, hour tests have been given at the conclusion of the third 
and sixth weeks of the term and final examinations of two hours dura- 
tion have followed. The stability of items may be studied, therefore, 
when tests have been taken by groups of students having had similar 
opportunities to become acquainted with lectures and assignments and 
when unequal learning opportunities have prevailed. Items may be 
compared which have been repeated in first hour tests given in different 
years, or when taken from a first hour test to be repeated in a second 
hour test or final examination during the same year. Different com- 
parisons reveal the resistance of items to innumerable conditions 
thought of as detrimental to stability. 
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There are probably three lines of interest to be satisfied in esti- 
mating the value of repeated items. One concerns the proportion of 
items maintaining discriminating power. Not all repeated items are 
chosen because of their marked validity although high quality has 
been the major recommendation. Some items were repeated without 
knowledge of their quality. They “looked good” and yet were 
deceiving. Other items missed being labelled excellent by degrees 
easily attributed to chance. Experience suggested a repetition would 
show their usefulness. Items discriminating perfectly the various 
levels of academic performance may fail to do so again while other 
items may prove of greater merit. What then will be the correspond- 
ence in categories of discriminating power from first to second usage? 

Another consideration is the relative stability of different types of 
questions. Will items of equal value remain equal or will, for example, 
single-word completion questions prove here as elsewhere their 
superiority for examination purposes? To answer this question we 
must know exactly what happens to each item upon its repetition. 

Our third interest is in the percentages of students passing items 
which have been used before. Conceivably items could retain their 
status as valid and yet be passed by increasingly higher percentages of 
all grades of students. Such a result is not likely although, in levelling 
attacks against the practice of repeating items, opponents have little 
difficulty in assuming it to be true. An analysis of items used within 
the same term should do much to prove or disprove the assertion that 
previous contact with a question destroys its usefulness. If students 
taking the examinations cannot profit significantly from contacts with 
items it is less likely that associates will profit more. 

A brief explanation of the categories into which questions are 
placed may be necessary although extended discussion is found else- 
where (pp. 247-251).* Perfect discrimination means that a question 
is passed by decreasing percentages of students in the letter-grade 
series from A grade to F grade. If a question results in a single inver- 
sion, for instance, if a higher percentage of D students than of C 
students pass an item, it is assigned to the category of one-letter grade 
displacement. When more than one inversion occurs, the criterion 
used is whether the question differentiates combined A-B grade 
students from the D-F students. Essentially, this is the criterion 
used by a number of investigators to select good and poor items. In 
the psychology course the extremes of grade distributions have 
included the highest and lowest twenty-five per cent of the total 
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group. 


separate the highest and lowest quarters of a class. 


taking examinations. 
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It seems to the writers that an item may be poor and yet 


Yet some investi- 
gators seem to assume that items are satisfactory if they do this or if 
they separate the highest third from the lowest third of the students 
In the present study, questions meeting the 
demands of the first two criteria certainly will not be rejected by 


TasLE I.—DiscrIMINATIVE VALUE OF NEW-TYPE QUESTIONS REPEATED 


AT CoMPARABLE PERIODS* 





Degree of discrimination 


Type of question 





Single- 


choice 


ogy 





Anal- 


word 





Wrong- 


answer 


Single- 
word 
com- 

pletion 





Single- 


choice 


ogy 





Anal- 


Wrong- 
word 
answer 








Single- 
word 
com- 

pletion 





First usage, 
per cent of questions 


Second usage, 
per cent of questions 





EY Dis cudben edbesdsbhess 


Perfect discrimination 
A-B from D-F students 


Psychology 2 


A-B from D-F students 


One letter-grade displacement. . 


eeeeewe eee eee ere ee ee 


Perfect discrimination......... 
One letter-grade displacement. . 


(68)t | (43) 
86.68) 90.7 
11.6 | 9.3 
1.4] 0.0 
0.0| 0.0 
(74) | (34) 
63.5 | 38.2 
32.4 | 38.2 
2.7 | 17.6 
1.4] 5.9 








(17) 
88.2 
11.8 
0.0 
0.0 
(40) 
47.5 
27.5 
10.0 
15.0 





(55) 
89.1 
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(68) 
70.6 
26.5 
1.4 
1.4 
(74) 
52.7 
40.5 
1.4 
5.4 


(43) 
69.8 
25.6 

0.0 

4.6 
(34) 
38.2 
38.2 

5.9 
17.6 





(17) 
64.7 
29.4 
0.0 
5.9 
(40) 
52.5 
22.5 
15.0 
10.0 
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* By comparable periods is meant two final examinations, two first hour tests, or two second hour 
tests, given during different years. 


t Numbers in parentheses indicate the number of questions validated and repeated. 


establishing a criterion as crude as the third one for the determination 


of good items. 


Finally, there are items which in specific examinations 


will not serve the ends of discrimination. They are uniformly too 


easy or too hard, or, strangely enough more D and F grade students 
will pass them than B and C grade students. 

The practice of selecting the more discriminating items for con- 
structing new examinations is evident from the perusal of Table I 
and the two tables which follow. A comparison of the tables in this 
study with Table II in a previous investigation (p. 250)* establishes 


In Psychology 1 approximately seventy per 
cent of single-word completion questions and fifty per cent of the three 


the degree of selection. 
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other types of items proved originally capable of perfect discrimination. 
For the same term, and when used in comparable periods, the first 
validations of the repeated items, irrespective of type, meet the same 
criterion from eighty-six to ninety per cent of the cases. In Psy- 
chology 2, the validated items originally proved less differentiating 
than those used in Psychology 1, and the explanatory hypothesis of 
greater homogeneity of the student body was offered. The construc- 
tion of new examinations has again profited from selection, but the 
paucity of excellent and appropriate questions and, at times, ignorance 
of actual item validity, have operated primarily to keep out mostly the 
poorest items. When categories entitled perfect discrimination and 
one letter-grade displacement are combined, we have convincing 
evidence that “‘satisfactory”’ items were generally chosen; yet, items 
having little differentiating power were not always ignored and, as 
will be shown later, some of these items serve satisfactorily: upon 
repetition. 

New-type questions, of the kind under review, have marked 
stability when repeated with different classes of students in succeeding 
years and at comparable periods of progress in a course. At least, 
there is little likelihood in heterogeneous groups of students that 
questions meeting the first two criteria of validity will later keep com- 
pany with useless items. Items discriminating perfectly originally 
will fall most often into the next lower category when their status is 
changed. The proportion of items having satisfactory validity will 
remain remarkably constant. In more homogeneous groups of stu- 
dents we must expect, however, fewer satisfactory items, but again 
the original proportion of satisfactory and of poor items will be main- 
tained. Interchanges of position oceur, as will be shown later, but 
the total pattern will change only slightly. 

Examination of Table I affords little support to those who have 
assumed the utilization of items during different years will result 
in lower validity in consequence of students studying tests rather 
than course content. Acquaintance with item validity leads one 
to anticipate interchanges at the boundaries of the first two categories. 
Slightly different emphasis in lectures may be partly responsible for 
changed validity of items based on either lectures or texts. Similari- 
ties from first to second usage rather than differences are consistently 
apparent. 

The repetition of items within the same term involves slight 
changes not found when repetition occurs during different years. 
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During the first and second terms, of the single-choice questions, a 
fair number have been relegated upon repetition to ineffectiveness, 
and wrong-word answer questions in the second term have suffered 
similarly but to a lesser degree. The most stable items, the single- 
word completion questions, also contribute their largest quota to the 
poor items during the second term. Perhaps the better motivation 
of students who have received a low grade in the first test is sufficient 
to impair slightly the validity of certain items in a later test. If so, 
the two forms of recognition items; namely, single-choice and wrong- 


TasB_eE I].—DiscriIMINATIVE VALUE OF NEW-TYPE QuESTIONS REPEATED Twicar 
WITHIN THE SAME TERM 





Type of question 





Single- Single- 
Single-| Anal- Wreag word |Single-| Anal- Wrong word 
a ee - word : word 
Degree of discrimination choice| ogy com- jchoice| ogy com- 
answer . answer : 
pletion pletion 























First usage, per cent of Second usage, per cent of 





questions questions 
ETN Pere TTS (35)* | (28) (20) (29) (35) (28) (20) (29) 
Perfect discrimination......... 80.0 | 57.2 70.0 93.0 | 48.6 | 67.9 70.0 93.0 
One letter-grade displacement..| 14.3 | 35.7 25.0 7.0 | 34.3 | 21.4 20.0 3.5 
A-B from D-F students....... 2.8 0.0 0.0 0.0 5.7 0.0 5.0 3.5 
EE SS ee 2.8 Ph 5.0 0.0 | 11.4 | 10.7 5.0 0.0 
ec be Seek ene puss s (89) | (21) (21) (79) (89) | (21) (21) (79) 
Perfect discrimination......... 61.7 | 52.4 52.4 78.5 | 38.2 | 66.6 57.2 59.5 
One letter-grade displacement. .| 20.2 | 38.1 28.6 12.7 | 33.7 | 23.8 19.0 27.8 
A-B from D-F students....... 7.9 0.0 0.0 1.3 6.7 4.8 9.5 5.1 
DSc du nsobownands sane sobek 10.1 9.5 19.0 7.6 | 21.4 4.8 14.3 7.6 





























* Numbers in parentheses indicate the number of questions validated and repeated. 


word answer questions, seem to be remembered best: and to be cor- 
rected more by the initially inferior students. Again, the stability of 
the items within the same quarter deserves more emphasis than the 
minor inconsistencies. 

Perhaps examination of Table II in relationship to Table I and 
Table III will serve to stress the greater instability of new-type ques- 
tions when they are used over again within the same quarter. No 
matter which of the four types of items are considered, when repetition 
occurs in a succeeding year at incomparable periods (for example, when 
questions which were originally used in a first hour test are repeated 
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in a second hour test or a final examination) we may expect the same 
proportions of poor items and of good items with which we began. 
There will be a smaller proportion of questions which will discriminate 
perfectly upon repetition and usually a larger proportion of items 
which will just fail to meet the most severe criterion. Those who have 
objected to the repetition of old items will find greatest support for 


TasLe II].—DiscriminaTIvE VALUE oF NEW-TYPE QuEsTIONS REPEATED 
AT DIFFERENT INTERVALS* 
































Type of question 
Single- Single- 
Single-| Anal- Wreng- word |Single-| Anal- Wrong- word 
ee : word - word 
Degree of discrimination choice| ogy com- |choice| ogy com- 
answer , answer : 
pletion pletion 
First usage, per cent of Second usage, per cent of 
questions questions 
EY Binns niceskansncnenes (52)t | (46) (13) (43) (52) | (46) (13) (43) 
Perfect discrimination......... 84.6 | 76.1 92.3 88.4 | 65.4 | 76.1 46.2 81.4 
One letter-grade displacement..| 15.4 | 21.7 7.7 11.6 | 30.8 | 19.5 46.2 14.0 
A-B from D-F students....... 0.0 2.2 0.0 0.0 0.0 2.2 7.7 2.3 
PE ie achostaevéedalecededdes 0.0 0.0 0.0 0.0 3.8 2.2 - 0.0 2.3 
ss . ciecnane cdl (44) | (13) (16) (25) (44) | (13) (16) (25) 
Perfect discrimination......... 70.4 | 61.5 25.0 76.0 | 47.7 | 23.1 43.7 84.0 
One letter-grade displacement. .| 25.0 | 15.4 43.7 24.0 | 45.5 | 53.8 37.5 16.0 
A-B from D-F students....... 2.3 | 23.0 6.3 0.0 2.3 | 23.1 0.0 0.0 
nest cadweretnrkorened eee 2.3 0.0 25.0 0.0 4.5 0.0 18.8 0.0 





























* Questions appearing in different years and in different examinations, as, for example, in first 
hour test in one year and second hour test another year. 
t Numbers in parentheses indicate the number of questions validated and repeated. 


their position when repetition occurs within the same term and the 
least support when items are used again in different years and particu- 
larly at varying intervals. At this time, an interpretation of the lesser 
stability of items repeated within the same quarter which seems most 
feasible is one referring to motivation rather than to the availability 
of questions in the files of the fraternal organizations. Students who 
have secured high examination grades cannot improve the quality of a 
satisfactory item, but students who through inquiry and discussion 
learn that ‘‘they missed that one’’ will be more alert when encountering 
it again; consequently, the discriminating power of some items will 
decrease. If questions actually had been filed by fraternal groups for 
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future reference, then the repetition of items in succeeding years should 
have been anticipated. The marked similarity in the proportions of 
items in the various categories from first to second usage argues 
against a leakage theory. The practice of benefiting from item 
analyses is supported by this study of stability. Tests are improved 
by the inclusion of items known to have high validity. Since validity 
is not something intuitively perceived, but rather a product of usage 
and analysis, new-type tests must be carefully constructed and built 
in part from items of known value if measuring devices of marked 
selectivity are an objective in college courses. 

A second problem is the relative stability of the four kinds of items 
comprising the examinations. The greater discriminating power of 
now one and now another class of questions upon first usage is a product 
of selection. Actually, single-word completion questions are most 
valid. The preceding tables do not warrant, however, any facile 
conclusion that single-word completion items are the most stable. 
Inequalities in the original materials make interpretation concerning 
differences in stability difficult, although the weight of opinion must 
surely be tipped in favor of recall items by examination of the three 
tables. Another mode of analysis seems desirable. 

Equality in validity of different types of questions can be closely 
approximated by segregating those items which originally showed 
perfect discrimination. Relative stability will become evident as we 
trace their subsequent history. We have ignored differences in valid- 
ity ascribable to homogeneity of student ability in the second term as 
well as slight variations dependent upon the period when items were 
repeated. All items which upon first usage discriminated perfectly 
have been pooled under their appropriate names. We find two hun- 
dred sixty-four single-choice, one hundred twenty-two analogy, 
seventy-five wrong-word answer, and two hundred twelve single-word 
completion questions in the category of most valid items. What 
percentage of these items will discriminate perfectly again? These 
percentages, respectively, are fifty-seven, sixty-seven, fifty-seven, and 
seventy-seven. Recall questions definitely serve examination pur- 
poses through their superior stability, and, of the recognition questions, 
analogy questions, with their emphasis upon discovering relationships, 
enjoy second rank. 

Most of the loss suffered by items which have discriminated per- 
fectly is transferred to the one letter-grade displacement category. 
Keeping the same order of types of items as used above, we find thirty- 








ould 
iS of 
Zues 
item 
ved 
dity 
sage 
built 
rked 


tems 
ar of 
duct 
most 
acile 
able. 
ning 
must 
three 


osely 
owed 
is we 
ralid- 


were 
‘ectly 
hun- 


logy, 
-word 


What 
These 
, and 

pur- 
tions, 
ships, 


1 per- 
gory. 
hirty- 





The Stability of New-type Examinations 509 


four, twenty-five, thirty-two, and eighteen per cent fail the most rigid 
demands of validity by one step displacement. This means that only 
nine, eight, eleven and four per cent of single-choice, analogy, wrong- 
word answer, and single-word completion questions, respectively, have 
been useless. 

From time to time one letter-grade displacement items miss being 
considered most selective by insignificant percentage differences. We 
should anticipate their repetition to result in the transfer of many 
items to the higher level category of validity. Thisisthecase. There 
were seventy-four single-choice, forty-five analogy, thirty-two wrong- 
word answer, and thirty-four completion questions repeated from the 
one grade inversion group. In the order listed, we find fifty-one, 
fifty-eight, sixty-two, and eighty-two per cent of them have dis- 
criminated perfectly the second time and forty-two, thirty-three, 
nineteen, and eighteen per cent have remained in the one letter-grade 
displacement class. Remarkably few of these items become useless. 
It is now evident that new-type questions of high validity warrant 
confidence in their power to discriminate levels of performance from 
time to time. From a practical standpoint, effort spent in trying to 
improve these second quality items may be effective but misapplied, 
since a fair proportion would have served their intended purpose 
perfectly on repetition without the effort. Until it has been demon- 
strated that revision improves the quality of these second class items, 
an instructor might more profitably devote his time to poor items 
covering important information or to the construction of entirely new 
ones. 

It is, perhaps, unnecessary to make extended inquiry into the 
poorest items, for there are only sixty-nine in the two lowest categories. 
They are very unstable items with the exception of the few single-word 
completion questions. Poor single-word completion questions remain 
poor. The recognition types of questions remain poor in forty to 
fifty per cent of the cases. Items originally invalid proved to be highly 
unstable with the exception noted. 

Finally, we shall consider briefly the differences in the proportion 
of students who pass items when they are repeated. The handling of 
four types of questions, in three different modes of repetition made 
during two terms, has required twenty-four tables. These tables indi- 
cate differences in percentages of students passing items from first to 
second usage. The differences have been expressed separately accord- 
ing to achievement levels ranging from A to F grades. Of the sixty- 








510 The Journal of Educational Psychology 


eight single-choice questions used at comparable periods we know, for 
example, that upon repetition twenty-nine of them showed exactly 
the same percentage of A grade students passing the item as passed 
it the first time; whereas these same sixty-eight items yielded only 
eleven which were passed by the same proportion of F students from 
one time to another. Some items are passed by much larger percent- 
ages of students upon second usage and some by much smaller ones. 

The median single-choice question is answered correctly by two 
per cent more A grade students and six per cent more F grade students, 
during both terms, when repeated in the same term than when first 
given. For all types of questions repeated under all conditions of this’ 
investigation and with all categories of letter-grades the median 
percentages, indicative of changes, range from minus six to plus 
twenty. There are one hundred twenty median differences available; 
of these only ten indicate that more than ten per cent of the students 
passed the repeated items than had done so originally, whereas eighty- 
five of the median differences show that less than five per cent more 
of the students passed the items at their second usage. 

There is no substantial basis for the fear that new-type questions 
cannot resist the many conditions which may be thought of as assailing 
their continued usefulness. If these questions had been public prop- 
erty, a greater percentage of students should have passed the items 
upon their repetition. The small median changes seem best ascribed 
to the factors as yet not measured in the complex conditions of learning 
and in the various samples of students. Although there are contrasts 
among the different types of items, the present method of analysis 
demonstrates the marked stability of these new-type questions. 


SUMMARY AND CONCLUSIONS 


The aim of this study was to determine the stability of four kinds 
of new-type questions which were repeated with the same students 
during the same term or with different students during different years. 
Stability is defined as the consistency with which individual items 
continue to differentiate students having one of five letter grades 
assigned to them in a course examination. 

Three lines of inquiry have been stressed in undertaking this 
analysis. The first concerned the proportion of items which upon 
repetition retained their discriminating power. Discrimination has 
been stated in terms of a four-step classification ranging from perfect 
discrimination through one letter-grade displacement, to items which 
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merely differentiate A—B grade students from D-F grade students, and 
finally to those items having no discriminating power. A second 
consideration was the relative stability of the four kinds of new-type 
questions; namely, single-choice, analogy, wrong-word answer, and 
single-word completion items. The third line of investigation was 
into the percentage of students who passed the repeated items. For 
both the initial and the second usage, the relative validity of the 
question has been consistently kept in view. 

The major points covered in this article may be summarized 
briefly as follows: 

(1) New-type questions have marked stability when repeated at 
comparable periods in succeeding years and with different groups of 
students. Variations in validation occur, but rarely are they greater 
than one category change. The proportion of valid items remains 
fairly constant. When items are repeated with the same students 
during the same term, there is a slightly greater tendency toward 
lesser stability. This decrease in stability seems best explained in 
terms of differential levels of motivation. Students with the two 
lowest course grades are probably more alert to deficiencies and cor- 
rect more of them than do students who miss fewer items. A drop in 
validity with repetition is to be expected rather than an increase in 
discriminating power. | 

(2) If the standard of validity chosen is that one which is most 
severe; namely, the criterion of perfect discrimination, then single- 
word completion questions are usually the most stable and single- 
choice recognition questions are the least stable. Only when 
questions are repeated during the same term to the same students, do 
analogy and wrong-word answer questions seem slightly more stable 
than single-word completion items. Single-choice recognition items 
are always the most unstable. 

(3) Questions do not appear to be significantly easier on second 
usage than on first usage. Neither the retention of questions nor the 
assumed filing of items by student groups for their own benefit or that 
of their associates seem to be important considerations in determining 
item stability. These results indicate that if new-type questions 
have been accumulated in large numbers, or if a limited supply is on 
hand and their dissemination has been protected, an instructor may 
permit old items to enter into the construction of new examinations 
without lowering the validity of his test. The inclusion of items 
demonstrated to possess high validity and the revision of definitely 
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invalid items covering important concepts should raise the validity 
of an examination above that to be expected from the use of unanalyzed 
questions. 
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RELATIONSHIPS BETWEEN THREE MULTIPLE 
ORTHOGONAL FACTORS AND FOUR BIFACTORS 


KARL J. HOLZINGER 


University of Chicago 


The problem of the relationships between factors has been dealt 
with rather generally by Harman and myself in Student Manual of 
Factor Analysis‘ and in a joint paper.?. The illustrations used in these 
discussions were from hypothetical data which were employed for 
simplicity. There is considerable value, however, in using actual 
data to illustrate our theorems, in order to show how well they apply 
to observational material. The desirability of formally expressing 
the general factor may also be concretely illustrated. 

The tests employed in the present example were taken from a 
study now in progress by Wenger, Freeman, and Holzinger. A brief 
description of these tests is given at the end of this article. 

Table I shows a bifactor pattern by Swineford and a multiple- 
orthogonal pattern, done by Harman independently of Miss Swineford’s 
solution. In this analysis, Harman began with the usual centroid 
solution and selected his angles of rotation so as to finally produce as 
large a number of positive loadings as possible. This solution is, of 
course, only one of a great many other multiple-orthogonal solutions 
which could be made, by changing the angles of rotation so as to 
produce higher positive loadings and a new negative loadings, or by 
seeking three zero loadings in a column, etc. It should also be noted 
that we have confined the present illustration to orthogonal solutions 
for brevity, although some correlation of group factors is present in 
the general multiple-factor solution. 

It will be observed that a general factor, called “‘u,”’ is found for the 
bifactor solution with three group factors, called “spatial” (s), 
“verbal” (v), and “‘immediate memory” (m). These names were 
assigned from the nature of the tests described below. 

The multiple-orthogonal solution has three factors. Loadings in 
this pattern which correspond to the group factor loadings of the 
bifactor pattern, have been underlined. for convenience in comparison. 





' Holzinger, Karl J.: Swineford, Frances: and Harman, Harry H.: Student 
Manual of Factor Analysis. The Statistical Laboratory, Department of Education, 
University of Chicago, pp. v1 + 102. 

* “Relationships between Factors from Certain Analyses.” Journal of Educa- 
tional Psychology, May, 1937, pp. 321-345. 
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It is thus apparent that Z, is largely “spatial,” Z: is chiefly “‘imme- 
diate memory,” while Z3, which seems to measure the ‘‘ verbal” factor, 
must involve some general ability because the loadings in this column 
are consistently high. In order to clarify this crude correspondence of 


TaBLE I.—Two Facror PatrEerRNs FoR TWENTY-THREE TESTS 














Bifactor pattern Multiple-orthogonal pattern 
u 8 v m Zi Z: Z: 

1 619 414 reek tact 618 151 382 

2 .440 .337 a Te es 521 .238 125 

3 .668 332 a Are 598 275 350 

4 522 602 yr iad 721 .167 214 

5 616 .383 ure os 640 .278 254 

x 618 .357 ‘pra me 575 146 391 

9 318 495 dt nae 572 094 084 

11 .778 Pale 502 264 .281 479 689 
12 .770 Be. .066 ep 491 150 609 
15 757 an .192 Avion .309 274 659 
17 845 adh = i 239 391 774 
18 .802 wind 258 Mis .289 268 756 
21 .750 re yeas ane 477 .357 402 
22 462 ee my 438 110 554 303 
23 629 ski< 48,3 426 .188 640 885 
24 | .606 hs ai 665 120 .729 895 
25 624 ihe pre, 472 202 702 335 
26 .430 Ae se, Ake 162 205 855 
27 610 446 Pte ony 679 257 240 
33 848 hse: 317 ssea 267 284 828 
34 834 es 172 +o 346 247 754 
36 487 236 ike - 414 055 409 
37 .668 288 Pee 5 564 124 .498 























factors, we next proceed to show the linear relationships among the 
factors involved, using the methods described in Student Manual 
of Factor Analysis. 

The necessary and sufficient conditions for the expression of one 


set of factors in terms of the other to be in standard form, may be 
stated as follows: 
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(1) The communalities must be the same in the two solutions. 

(2) The intercorrelations of the tests, obtained from the patterns, 
must be the same. This implies that both patterns are equally good 
fits to the observed correlations. 

In the process of relating the two sets of factors it is necessary to 
obtain as many equations as there are group factors. We, therefore, 
reduce all the original tests to three new variables denoted here as 
Za, 2, and 2, corresponding to the group factors in the bifactor pattern. 
Thus z. is formed for both bifactor and multiple-factor solutions by 
adding tests 1, 2, 3, 4, 5, 8, 9, 27, 36 and 37 (spatial tests), and then 
dividing by the combined standard deviation of these tests to reduce 
the new variables to unit variance. 

The reduced patterns from this work will next be written as equa- 
tions rather than in the abbreviated tabular form. We thus obtain 


Za = .£824Z, + .249Z2 + .411Z; (h = .954) 
Z = .221Z, + .760Z2 + .516Z; (h = .945) (1) 
Ze = .3865Z, + .344Z2 + .834Z; (h = .973) 


where ‘“‘h”’ is the communality or the square root of the sum of the 
squares of the factor loadings for each variable. The corresponding 
equations from the bifactor pattern become 


Za = .777u + .5438 (h = .948) 
Zz, = .759u + .123v + .554m (h = .948) (2) 
Ze = .927u + .301v + .043m (h = .976) 


From the values for h, it is apparent that the communalities of the 
reduced tests are essentially the same for corresponding tests in the 
two patterns. Our first condition is, therefore, met satisfactorily. 
The intercorrelations of the reduced tests from patterns (1) and (2) are 
given in Table II.! The slight differences between corresponding 
correlations indicate that our second condition for relating factors has 
also been met. . 

In determining the relationships amongst factors we equate the 
expression for z, from equations (1) to the corresponding expression 
from equations (2), and similarly for z and z.. We then regard the 





1 Since test II is included in both z and z, there is an additional overlap between 
them, which has been omitted from equations (1) and (2) because it does not 
affect measures of relationships between factors. The correlations of Table II 
were computed without taking this slight overlap into account. 
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TaBLE II.—INTERCORRELATIONS OF COMPOSITE TESTS 

















Multiple orthogonal Bifactor Difference 
2a 2d Za 2b Za 2 
Za Za Za 
2b .583 2 .590 2 — .007 
Ze .729 .772 Ze .720 . 764 Ze .009 .008 























multiple factors Z:, Z2, Zs as three unknowns and the four bifactors, 

u, 8s, v and m, as knowns. If the determinant for equations (1) does 

not vanish, we may then solve for the Z factors in terms of the bifac-’ 
tors. The solutions obtained are as follows: 


Z, = .445u + .842s — .216v — .152m 
Z, = .3438u + .007s — .117v + .963m (3) 
Z3= 771u — .3872s + 9040 — .279m 





As checks on the work we find the standard deviations from equa- 
tions (3) to be os, = .993, o,, = 1.029, o,, = 1.032. These are very 
nearly unity. The intercorrelations of the Z’s from equations (3) 
are also nearly zero, being .041, —.029, and —.066. 

From equations (3) it is apparent that Z,, is largely the bifactor s, 
and Zz is chiefly the bifactor m. Z3, however, is more largely made up 
of u than of »v. 

Instead of expressing the factors in linear form we may square all 
the values in equations (3), divide by N, and obtain the relative con- 
tributions of the bifactors to the total unit variance of the Z’s. 


03,27 = .2070,? + .7090,? + .0470,? + .0230,,? 
o2,7 = .1180,? + .000c,? + .0140,? + .9270,,? (4) 
g:,7 = .5940,? + .138¢,? + .2540,? + .0780,? 


The variances of the bifactors are equal to one, but they are 
expressed here formally to indicate the allocation of total variance. 
It is apparent from these equations that over seventy per cent of the 
variance of Z; is attributable to the bifactor s, while over ninety 
per cent of the variance of Z: is attributable to the factor m. In the 
case of Z3, however, more than twice as much variance is due to u 
as to the factor v. 

Returning to Table I, we may note an example of a poor test of 
group factor ability. This is test 21. In the bifactor analysis it 
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had an appreciable loading only with the general u factor. In the 
multiple-orthogonal solution this test has fairly high loadings for all 
three Z factors, which we should expect from equations (3). 

The comparisons we have made\illustrate the fact that we may 
express a set of tests as linear functions of a general and several group 
factors, or as functions of group factors alone. It is also shown that the 
exact relationships between these two sets of factors may be given if 
certain very reasonable conditions are met. Both solutions are based 
on the method of least squares, and they are equally correct in the 
sense that a reasonable set of rules and assumptions has been employed 
for each. We cannot, therefore, accept or refuse a solution with a 
general factor on the basis of statistical rules alone, except in the rare 
case that the correlations are sufficiently high to demand such a factor 
under boundary conditions. 

We next propose to set down some arguments which lead us to 
prefer the bifactor to the multiple-factor solution in general. Some 
of these arguments are matters of common sense, while others have a 
statistical basis. Unfortunately, there does not appear to be any 
undisputed psychological fact which can assist us in our choice of 
analysis. 

(1) The bifactor method leads to the simpler solution. Thus in 
Table I there are only forty-five common factor loadings by that 
method, whereas there are sixty-nine such loadings for the multiple- 
orthogonal solution. With more tests and factors, this contrast in 
simplicity becomes more pronounced. 

(2) The bifactor solution is much easier and more rapid. The 
estimated times for calculation in the above example are about in the 
ratio of one to three. 

(3) If the group factors are correlated (as they usually are) this 
implies an additional factor amongst all variables, 7.e., a general 
factor. We should bring this facton out in the open for the same 
reason that we do so in the case of group factors. Intercorrelation of 
either tests or factors is the justification for introducing underlying 
factors. The so-called oblique analysis is not, from this viewpoint, 
considered as complete. 

(4) In the case of orthogonal solutions the correlation between 
tests in different groups is accounted for more simply by the bifactor 
method. For example, the correlation between tests 1 and 11 from 





1 For a brief discussion of such boundary conditions see Student Manual of 
Factor Analysis, Chap. vu. 
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Table I may be thought of as the product of the general bifactor load- 
ings .619 X .778 = .482. To account for this correlation in the 
multiple-orthogonal solution we must make use of the loadings for 
all three Z factors. The correlation thus becomes: .618 X .281 + 
.151 X .479 + .882 X .689 = .509. In other words the common 
factor is present in the multiple-orthogonal solution, but is expressed by 
overlappings of group factors. In statistical language this means that 
the multiple-orthogonal solution has a greater complexity than the 
bifactor solution. 

There are doubtless arguments to be made on the other side and, 
perhaps, there may emerge some psychological bases for preference of 
pattern, but such questions will not be dealt with here. 

For the sake of emphasis we should like to repeat the point of this 
whole illustration about which there has certainly been misunder- 
standing and confusion. It is quite wrong to say that no general 
factor exists, since it does not appear explicitly in a multiple-factor 
solution. We have tried to show that the general factor may, and we 
believe should, be formally expressed. This does not mean that we 
refuse multiple-factor solutions, but we believe they should not be 
used to deny the general factor. 


BRIEF DESCRIPTION OF THE TESTS 


1. Picture rearrangement—mental rearrangement of parts of pictures, 
indicating correct arrangement by numbers. 
2. Picture completion—drawing in of missing parts of a series of pictures. 
3. Pattern series completion—drawing in of missing elements of orderly 
serial patterns. 
4. Form board—drawing given parts in a given whole figure. 
. Designs—indicating by number the nine components of given block 
designs (combinations of six different blocks). 
8. Picture sequence—indications of proper sequence of disarranged comic 
cartoon strips. 
9. Inverted drawing—drawing the true inverses of line figures. 
11. Sentence completion—writing in omitted word or words of sentences. 
12. Verbal analogies—discovering the fourth part of two analogous word 
pairs. 
15. Reasoning—solution of true or disguised syllogisms. 
17. Vocabulary—multiple choice synonyms (no time limit). 
18. Opposites—writing the opposites of a list of common words. 
21. Oral directions—immediate memory span in terms of execution of orally 
presented directions. 
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. Digits—immediate memory span for orally presented digit series (Recall). 


Sentences—immediate memory span for orally presented groups of related 
words (Recall). 


. Words—immediate memory span for orally presented groups of unrelated 


words (Recall). 


Objects—immediate memory span for visually presented groups of line 
drawings of common objects (Recall). 


. Faces and names—recognition among twenty-five faces of twelve faces 


studied three minutes earlier. Recall of twelve first and last names. 


. Designs—immediate memory for details of figures visually presented, one 


at a time (Recall). 


. Comprehension of directions—reading, comprehending, and carrying out 


written directions (no time limit). 


. Word grouping—discovering the common element in four of five words 


(no time limit). 

Pattern grouping—discovering the common element in four of five 
figures (no time limit). 

Pattern analogies—discovering the fourth part of two analogous pattern 
pairs (no time limit). 
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A COMPARISON OF THE SCORES ON THE SPEARMAN 
VISUAL PERCEPTION TEST, PART I, 
ADMINISTERED BY VERBAL AND 
PANTOMIME DIRECTIONS! 


IRVING LORGE AND SETH ARSENIAN 
Teachers College, Columbia University 


In 1932 and 1933, Charles Spearman developed for experimental 
purposes a test for the measurement of ‘‘g,”’ z.e., the ability involved in 
the eduction of relations and correlates. This test, the Spearman 
Visual Perception Test, consisted of three parts, of which only Part 
I was used in connection with this study. 

Part I of the Spearman Visual Perception Test, hereinafter called 
SVPI, consists of six subtests and a fore-exercise called forms 1, 2, 3, 
4, 5, 6, and 0, respectively. The essential task involved in these sub- 
tests is to see relationships between and among geometrical figures. 
Originally, the directions provided by Spearman were given orally, 
and demonstrated in the fore-exercise, subtest 0. Thereafter, seven 
minutes were allowed on a speed-power basis for each of the subtests. 
Later, in connection with the contribution of Karl Holzinger and 
Charles Spearman to the Unitary Traits Study, the directions and 
procedure involved in the administration of the SVPI were revised. 
Subtest 1 (Form 1) was used as fore-exercise in the revised procedure, 
and five minutes were allowed on a speed-power basis for the comple- 
tion of subtests 2, 3,4,5,and6. Inthe Holzinger-Spearman directions 
and procedure, the method was explained orally. 

When the junior author wished to study the problem of bilingualism 
in the measurement of mental development,? he developed directions 
and procedure for non-language pantomimic administration. These 
directions are printed as an appendix to an article by the junior author.’ 





1 Acknowledgment is hereby made of the services rendered by the personnel 
furnished by the Works Division, Emergency Relief Bureau of New York City on 
Project 89F B-125X. 

This study is part of a larger study in Interests, Attitudes, and Motives sup- 
ported in part by a grant from the Columbia University Council for Research in 
the Social Sciences. 

2 Arsenian, Seth: Bilingualism and Mental Development. Bureau of Publica- 
tions, Teachers College, Columbia University, 1937, v1 + 164. 

* Arsenian, Seth: ‘‘The Spearman Visual Perception Test (Part I), with 
pantomime directions.”” The British Journal of Educational Psychology, Vol. vu. 
1937, pp. 287-301. 
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In the pantomimic administration, the first six lines of subtest 1 were 
used for demonstration. Subtest 0 was omitted, and subtests 2, 3, 4, 
5, and 6 were given on a speed-power basis for five minutes each. 

These two methods—pantomime and oral—are similar in the time 
allowed for each of the five subtests used, and in omitting subtest 0 
either as fore-exercise or as exercise. The essential difference in the 
two methods is that the pantomime directions exclude all language, 
where the Holzinger-Spearman directions use oral language. 

Since the essential purpose of the SVPI is to obtain a measure of 
‘‘g,”’ the two methods of administering the SVPI may be evaluated by 
the degree to which either method measures ‘‘g.’”’ Using Holzinger’s 
evaluation of the IER Intelligence Scale CAVD,! it may be assumed 
that the CAVD is a measure of ‘‘g.’”’ Holzinger reports the correlation 
of the CAVD with ‘‘g” as .960 which is sufficiently high to allow the 
use of CAVD as a criterion of “‘g.”’ 

In the evaluation of the two methods of administering SVPI, two 
groups of forty subjects each were selected from an adult population 
made available to the senior author by the Works Progress Adminis- 
tration of New York City. These groups were selected on the basis 
of their chronological age and their average score on five forms of the 
IER Intelligence Scale CAVD? so that age and CAVD difference were 
approximately zero. Each person of one group was equated to 
another person in the second group on the basis of these scores. 
Then, randomly, one group was given the SVPI with Pantomime 
directions, and the other group was given the SVPI with the Holzinger- 
Spearman directions. The correlation between the CAVD scores of 
the pantomime group and the CAVD scores of the verbal group was 
.98; the correlation for age was .95. The mean CAVD of the panto- 
mime group was 400.4, and of the verbal group was 400.5 with standard 
deviations of 15.6 and 15.1, respectively. For age, the means were 
447.2 months and 442.5 months, with standard deviations of 137.5 
and 138.9 for the pantomime and verbal groups, respectively. 

The results of the administration of the SVPI to the two groups 
were for the 





! Holzinger, Karl J.: ‘‘Thorndike’s CAVD is full of ‘g,’’’ The Journal of Educa- 
ttonal Psychology, Vol. xx11, 1931, pp. 161-166. 

* Thorndike, E. L., Woodyard, E., and Lorge, I.: ‘Four new forms of the 
IER Intelligence Scale [CAVD] for use on the college or higher levels.’”’ School 
and Society, Vol. xu11, 1935, pp. 271-272. 
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Pantomime group Verbal group 
NG oh eae Gate nisin gels > waite em 185.9 251.2 
CE eS osc es hs oceteeete bees 74.2 89.2 
Se Laas os ss cbs eee ade .95 .93 
TICAVD) ager: sss sce cece reer eee rece sseees — .25 — .27 
TICAVD)(BVPI) «eee ccc creer ere eeeesesens .58 .48 
SE eS a ee ane eo — .49 — .57 











Moreover, the correlation between the Spearman scores of the two 
groups, using the original matching as the basis for pairing, was .49. | 
It may be inferred that a genuine difference in mean SVPI score would 
be a sequel of the administration of the test to equivalent groups by 
the two different methods used. The ccrrelation indicates a relation- 
ship between the scores by the two methods. The essential evaluation, 
however, is the degree to which the two tests measure ‘‘g.”’ 

It will be seen that the correlation of SVPI with CAVD in the 
pantomime group is higher than it is in the verbal group. The differ- 
ence, however, is not statistically significant. Therefore, it may be 
inferred that either pantomime or verbal directions in administering 
the SVPI give results similar in reliability (.95 and .93), and approxi- 
mately similar in validity (.58 and .48). 
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A STUDY OF THE EFFECTS OF SCHOOL 
ACCELERATION UPON THE PERSONALITY AND 
SOCIAL ADJUSTMENTS OF HIGH-SCHOOL AND 

UNIVERSITY STUDENTS* 


THELBURN L. ENGLE 
Indiana University 
INTRODUCTION 


If a child is capable of making more rapid school progress than his 
fellow pupils, should he be accelerated? Conflicting opinions and 
conflicting data have been presented by those who have worked on 
this very complex problem. In the present study an attempt has 
been made to compare the personality and social adjustments of 
accelerated and non-accelerated high-school and university students. 

Most investigators have taken care to point out that accelerated 
students are well above the mean in general intelligence and school 
marks, but they have not taken into consideration the possibility that 
these factors in themselves might have an influence on personality and 
social adjustments. In the present study accelerated and non- 
accelerated students have been matched on four bases: (1) Sex, (2) 
Grade location (freshmen, sophomore, junior, senior), (3) Intelligence 
rating as measured by standardized intelligence tests, (4): School 
marks. The principal variable is the chronological age at which either 
high school or university has been entered. Also, comparisons have 
been made between subjects matched on the bases of age, sex, intelli- 
gence rating, and school marks, with grade location variable. It 
is to be noted that the problem here studied is not concerned with 
how acceleration has taken place. Personality schedule scores and 
the number and kinds of social activities participated in have been 
used as bases of comparison. Opinions of accelerated subjects have 
been obtained on whether or not acceleration has been a social handi- 
cap to them, and, if so, in what ways. 


RESEARCH PROCEDURE 


Selection and Matching of Subjects —The subjects used in the 
present study were selected from three high schools and one university. 
The three high schools were: Shortridge High School of Indianapolis; 


* Publications of the Indiana University Psychological Clinics, Series II, 
Number 17. 





523 





| 
| 
: 
| 


ee A ON 











524 The Journal of Educational Psychology 


George Washington High School of Indianapolis; and Bloomington 
High School of Bloomington, Indiana. The university subjects were 
selected from students attending Indiana University. The records 
of each high school were searched for the names of pupils who had 
entered that, high school prior to their thirteenth birthday. All 
subjects were attending school during the 1935-1936 school year. 
Because of the uncertainty of the records and because of changes in 
environment, only such pupils as had entered and remained in the 
given school were considered. Of the one hundred thirty-six pupils 
who had entered these three high schools prior to their thirteenth 
birthday seventy-seven were girls and fifty-nine were boys. For 
various reasons it was impossible to use all of these pupils in the later 
calculations. No colored pupils were used because of the possible 
effect of racial differences. 

The files of the Director, Personnel Division, Indiana University, 
were searched for the names of those students who had entered before 
their seventeenth birthday. In order to secure uniformity of environ- 
ment and because of possible difficulties in transferred records, only 
the names of those having attended no other college than Indiana 
University were included. The records for classes entering Indiana 
University in 1932, 1933, 1934, and 1935 were used. The members of 
the class entering the University in 1932 were seniors at the time this 
study was made. Of one hundred sixty-five students who had entered 
Indiana University prior to their seventeenth birthday, ninety-nine 
were women and sixty-six were men. For various reasons not all of 
these students could be used in the later work of the present study. 
One colored man and one colored woman were not used in later 
calculations. 

After lists of accelerated students were obtained the next step was 
to match as many as possible with students who had entered high 
school or college at the more usual ages of fourteen (after fourteenth 
and before fifteenth birthday) and eighteen (after eighteenth and 
before nineteenth birthday), respectively. Pairs were matched on 
four bases: 

(1) Sex. 

(2) Grade location. Freshmen were matched with freshmen, 
sophomores with sophomores, and so forth. In this matching no 
attempt was made to count hours of credit. It was considered that 
for the purposes of the present study length of residence was a better 
criterion than hours of credit. 
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(3) Intelligence rating as measured by standardized intelligence 
tests. IQ’s obtained from various group tests of mental ability were 
available for all subjects. In all cases, so far as possible, IQ’s obtained 
from one test were matched with other IQ’s obtained from the same 
test. Pupils were considered to be matched if their IQ’s were within 
five points of each other. The mean IQ of the one hundred accelerated 
subjects who were finally selected for intensive study was 117.1; for 
the one hundred matched non-accelerated subjects the mean IQ was 
115.2. 

At Indiana University each entering student is required to take 
the American Council of Education Psychological Examination. 
Scores are recorded on the student records in terms of centiles. Stud- 
ents were considered to be matched if their centile scores were within 
five points of each other. The mean centile score of the sixty-four 
accelerated subjects who were finally selected for intensive study was 
72.3; for the sixty-four matched non-accelerated subjects the mean 
centile score was 72.7. No claim is made that the matchings at the 
high-school and university levels are equivalent. Both merely repre- 
sent an attempt to hold as nearly constant as practical the factor 
of general intelligence as measured by intelligence tests. 

(4) School marks. Although the marking systems in the Indian- 
apolis and Bloomington schools were not, identical, both were five- 
letter systems. For the purposes of this study these marks were 
weighted 4, 3, 2, 1, and 0, the weighting 4 being given to the highest 
mark for the school, the weighting 0 being given to the failing mark. 
Pupils were considered to be matched in school marks if their marks 
were within five-tenths of a point of each other. The mean school 
mark of the one hundred accelerated subjects who were finally selected 
for intensive study was 2.38; for the one hundred matched non- 
accelerated subjects the mean was 2.39. Final semester marks were 
used in all cases. 

At Indiana University a range of five marks is used: A, B, C, D, 
and F. A is the highest mark, the others descending in order to F, 
which represents failure. A scholastic index is figured for each student 
for each semester. In figuring this index each hour of mark A is 
weighted 3, each hour of mark B is weighted 2, C—1, D—0, F—40. 
The scholastic index is obtained by dividing the weighted score by the 
total number of hours of work carried. Students were considered to 
be matched in scholastic indices if their indices were within five- 
tenths of a point of each other. The mean scholastic index of the 
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sixty-four accelerated subjects who were finally selected for intensive 
study was 1.97; for the sixty-four matched non-accelerated subjects 
the mean was 1.95. No claim is made that the matchings at the 
high-school and university levels are equivalent. Both merely repre- 
sent an attempt to hold as nearly constant as practical the factor of 
school marks. 

To be considered as a perfectly matched pair, subjects had to be 
matched in accord with all four criteria. It was found to be impossible 
to match all in this way so some were matched approximately. In 
these approximately matched pairs the criteria of sex and grade loca- 
tion were always met and usually either intelligence or school marks 
could be matched, but not both. In all such cases of approximate 
matches the nearest possible match was secured. In the cases of high- 
school subjects, matching was always done within the given school, 
thus to some extent holding constant the factor of environment. By 
means of this matching technique eighty-four perfectly matched pairs 
and sixteen approximately matched pairs of high-school students, 
and sixty-four perfectly matched pairs of university students were 
obtained. Unless otherwise indicated all data in the remainder of 
this study will be based on these three hundred twenty-eight subjects. 
Correlations between the pairs were calculated. Of course, correla- 
tions in sex and grade location were perfect for all pairs. For the 
100 pairs of high-school subjects the correlation between the IQ’s of 
accelerated and non-accelerated subjects was +.94 + .01, for school 
marks the correlation was +.93 + .01. For the 64 pairs of university 
subjects the correlation between the intelligence centile ranks of 
accelerated and non-accelerated subjects was +.99 + .002, for 
scholastic indices the correlation was +.88 + .020. 

The Personality Schedule——In order to secure a measure of the 
personality adjustments of these subjects a personality schedule was 
given. Many schedules were available for use, but most of them were 
not suited to children as young as age twelve. After a careful study 
of the available material it was decided to use the Cowan Adolescent 
Personality Schedule.! The scoring key published with the schedule 
indicates the responses which are to be considered as maladjusted. 
Therefore, in the following data the higher the score the greater is the 
personality maladjustment. 





1 Published in Child Development, Vol. v1, 1935, pp. 77-87 and used by special 
permission of the author and of the publishers. Now available from the Wichita 
Child Research Laboratory, Wichita, Kansas. 
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In the present study, the Cowan schedule was given to all subjects 
in the form in which it was published. The statements were mimeo- 
graphed oneight pages bound in booklet form. For purposes of scoring 
and calculations, four questions of the original two hundred one were 
omitted. In the course of giving the schedule it was found that three 
questions were ambigious, #26, #106, and #163. It was believed 
that the maladjusted answer to question #27 as given by Cowan was 
obviously incorrect for high-school and university subjects and it was 
omitted in the scoring. After striking out these four statements the 
reliability was calculated. There were ninety-nine statements on 
the odd-numbered pages and ninety-eight on the even-numbered 
pages. Scores on odd-numbered pages were correlated against scores 
on even-numbered pages. For high-school subjects the reliability 
coefficient between the two halves was found to be +.80 + .02 and 
for the total test, using Spearman’s formula, +.89; for university 
subjects the corresponding figures were found to be +.90 + .01 and 
+.95. 

The Questionnaire—A short questionnaire was bound with the 
personality schedule as the ninth page of the booklet. One question 
which was asked was, ‘‘ Did you skip one or more grades in elementary 
school? If so, which one or ones? If so, do you think it has been a 
handicap to you in a social way?” In order to eliminate a possible 
suggestion that a study was being made of accelerated students and 
their ‘‘ peculiarities” a similar question was asked concerning failures 
in elementary school. Subjects were asked to indicate the number 
of social activities in which they had engaged during the past two 
weeks, classifying their answers under seven headings: (A) Parties 
without dancing; (B) Dances or parties with dancing; (C) Theatre or 
movies; (D) Club, committee, Scout, etc. meetings not connected with 
the school; (EZ) Club, committee, etc. meetings connected with the 
school; (F) All other activities, if any; (@) How many of the above 
activities were engaged in as “dates” with the opposite sex? High- 
school subjects were asked to sign their names. University subjects 
were asked to sign their names on a numbered card, a corresponding 
number on the booklet being the only means of identification. 

Securing Data.—The writer attended personally to the giving of the 
schedule and questionnaire. High-school subjects were called from 
classes and study halls and the material was presented to them in 
groups. Every effort was made to make the subjects feel that the 
schedule was in no way an examination of the usual school type. 
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Emphasis was placed upon the fact that there were no right or wrong 
answers. It was explained to them that although they signed their 
names their answers would not be known to their teachers. 

A different technique was used at the university level. A postal 
card was sent to those students who had entered the university at age 
sixteen or younger asking them to report to the psychological clinic 
on a given day. The names of those who responded were then paired 
with the names of students who had entered the university at age 
eighteen and similar cards were sent to this group. It was explained 
to each subject that the material was for scientific purposes only and 
that neither the instructors nor any one else who knew him would 
know how he answered the questions. Also it was explained that the 
questions were made out in terms of the vocabulary of high-school 
pupils. 

Securing Individual Reactions on Being Socially Handicapped.— 
An attempt was made to secure more detailed responses from those 
subjects who stated on the questionnaire that they believed skipping 
grades in elementary school had been a social handicap to them. At 
the high-school level, so far as was possible, the writer interviewed 
these subjects and in a short informal talk asked them to write down 
why they had answered this question as they did. University subjects 
indicating that they believed themselves to be socially handicapped by 
acceleration and a few high-school subjects who could not be inter- 
viewed were sent a letter asking for the same information. 


STATISTICAL TREATMENT OF DATA SECURED FROM ACCELERATED 
AND NON-ACCELERATED SUBJECTS 


The grade location and sex distribution of the one hundred pairs 
of high-school subjects and sixty-four pairs of university subjects 
finally selected for intensive study is given in Table I. The subjects 
included in Table I have furrished all data to be considered in this 
section except that, in the part dealing with subjects matched in age 
instead of grade location, a few additional subjects have been used. 

Distribution of Mean Scores on the Personality Schedule.—In order 
to determine whether or not there were significant differences between 
the personality schedule scores of accelerated and non-accelerated 
subjects, the calculations indicated in Table II were made. It will 
be seen from Table II that the ratio of difference to standard deviation 
of the difference is in no case statistically significant. 
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All high-school and university subjects (accelerated and non- 
accelerated combined) were grouped according to sex. The mean 
scores and the reliabilities of the differences are indicated in Table III. 
It is clear that for these age groups girls and women do make higher 
(more maladjusted) scores than boys and men. 


TasBLe I.—GrapE LocaTION AND Sex DistripvuTion or Pairs or Supsects Uszp 
IN CALCULATIONS 
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TaBLe II].—RELIABILITIES OF DIFFERENCES IN M&AN PERSONALITY SCHEDULE 
Scores or ACCELERATED AND NON-ACCELERATED SUBJECTS 
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The mean score of all accelerated and non-accelerated high-school 
subjects combined was found to be 40.1; while for all accelerated and 
non-accelerated university subjects combined the mean score on the 
schedule was found to be 42.3. This difference of only 2.2 points \ 
seems to indicate that maturation in itself does not appreciably affect 
the scores on the personality schedule. 
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The mean personality schedule score for all accelerated high-school 
subjects was 39.8, for non-accelerated subjects it was 40.4. The 
corresponding figures for university subjects were 43.6 and 41.0, 
respectively. The mean score for all accelerated subjects (high school 
and university) was 41.3 and for all non-accelerated subjects it was 
40.6, a difference of 0.7. The standard deviation of this difference was 
1.95 and the difference divided by the standard deviation of the differ- 
ence was 0.36. 

In order to determine whether or not there was any relationship 
between the personality schedule scores of accelerated and non- 


TaB.Le III.—RewiaBiuities or Sex DirrERENCES IN MEAN PERSONALITY 
ScHEDULE ScoRES 
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accelerated subjects, correlations were calculated. It must be borne 
in mind that the pairs of subjects whose scores were correlated were 
closely matched in general intelligence as measured by intelligence 
tests, school marks, sex, and grade location. The correlation between 
the personality schedule scores of accelerated and non-accelerated 
high-school subjects was found to be +.14 + .07. For university 
subjects the correlation was found to be +.03 + .08. 

In general it may be said that these comparisons indicate that there 
is no apparent relationship between the personality schedule scores 
of those entering high school or university younger than the usual age 
and those entering at the usual age. 

Comparison of Personality Schedule Scores of Subjects Expressing 
Opinions That Elementary-school Acceleration Had or Had Not Been a 
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Social Handicap.—lIn order to determine whether or not scores on the 
personality schedule were related to a feeling of being socially handi- 
capped, the scores made by those who stated that skipping grades in 
elementary school had been a social handicap were compared with 
scores made by those who stated that skipping grades in elementary 
school had not been a social handicap. Data were compiled for 
seventy-nine high-school subjects and sixty university subjects who 
definitely stated that they had skipped grades in elementary school 
and who expressed the opinion that such acceleration had or had not 
been a handicap in a social way. Of the thirty-six high-school boys 
expressing such opinions, eight (22.2 per cent) stated that acceleration 
had been a social handicap; seven (16.3 per cent) of the forty-three 
high-school girls expressed this opinion. Of the twenty-six university 
men expressing such opinions, eleven (42.3 per cent) stated that 
acceleration had been a social handicap; five (15.1 per cent) of the 
thirty-three university women expressed this opinion. It will be 
noted that at both educational levels higher percentages of males than 
of females expressed the opinion that acceleration had been a social 
handicap. Furthermore, the percentage for university men was much 
greater than for high-school boys. However, the numbers on which 
these comparisons are based are very small. 

Table IV shows the distribution of mean total scores for those who 
expressed the opinions that acceleration had or had not been 4a social 
handicap. It will be noted that in every group the scores made by 
those who believed acceleration had been a social handicap indicate 
greater personality maladjustment than the scores made by those who 
believed acceleration had not been a social handicap. Furthermore, 
the differences are greater for university than for high-school subjects. 
It is evident that the scores on the personality schedule are a measure 
of something related to a feeling of being socially handicapped. 

The mean score of all high-school boys and girls believing accelera- 
tion to be a handicap was 47.5; for those believing acceleration not 
to be a handicap the mean score was 37.4. The corresponding figures 
for university subjects were 57.3 and 39.9. As has been indicated 
previously, the mean score of all subjects entering high school at age 
fourteen was 40.0 and the mean score for all university subjects 
entering the university at age eighteen was 41.0. Thus it will be seen 
that the mean scores for subjects entering high school and university 
at the usual ages are much nearer the mean scores of all those acceler- 
ated subjects who said skipping grades had not been a social handicap 
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than they are to the mean scores of all those accelerated subjects who 
said skipping grades had been a social handicap. 


Taste IV.—RELIABILITIES OF DIFFERENCES BETWEEN 
ScHEDULE Scores oF ACCELERATED SuBJECTS BELIEVING ACCELERATION TO 
Be a Socrtat HANDICAP AND ACCELERATED SuBJECTS BELIEVING 
ACCELERATION Nort To Be a Soctau HANpDIcAP 
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Handicap.......... 11 | 51.3/24.95 
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niver- 
sity adh 
Handicap.......... 5 | 70.6)16.85 
Women Not a handicap.....| 28 44 .3)20.29 26.3 | 8.45 | 3.11 











Social Activities of Accelerated and Non-accelerated Subjects.—In 
order to supplement data found on the personality schedule and in 
order to make comparisons, data were secured on social activities. On 
the questionnaire, subjects were asked to indicate the number and kind 
of social activities in which they had engaged during the arbitrarily 
chosen period of the previous two weeks. 

Correlations between personality schedule scores and the number of 
social activities participated in were calculated. In calculating the 
total number of social activities for each subject, attendance at the 
theatre or movies was not included. Although such attendance may 
be a social activity, it may be quite the opposite. ‘‘Dates’’ were not 
included because of probable duplication in the other classifications. 
A small number of social activities was correlated against low (well- 
adjusted) personality schedule scores. For all high-school boys the 
correlation was —.076 + .071; for all high-school girls the correlation 
was —.124 + .063. The corresponding figures for university men 
and women were —.064 + .090 and +.035 + .079, respectively. 
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Comparisons were made of the percentages of the various groups 
participating in given social activities one or more times during a 
period of two weeks. The data are to be found in Table V. An 
example of the reading of this table will serve to make clear the mean- 
ing: 26.7 per cent of the forty-five high-school boys who entered high 
school at age twelve attended one or more parties at which there was 
no dancing while 35.6 per cent of the forty-five boys who entered high 
school at age fourteen attended such parties during the period of two 
weeks. Differences between percentages for accelerated and non- 
accelerated subjects are statistically unreliable in all cases although 
some of the differences are suggestive. 


TasLE V.—PERCENTAGES OF GROUPS PARTICIPATING ONE OR More TIMES IN 
CrerTAIN Social ACTIVITIES DURING A PERIOD oF Two WEEKS 
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bone aes eae 
Boys Girls Men Women 
Is Sins kee avees kena tae 12; 14] 12] 14| 16); 18 | 16} 18 
Se Or ON: oo oo cc towne ea wen 45 | 45 | 55 | 55 | 28 | 28 | 36 | 36 
A. Parties without dancing............ 26. 7/35 6.29. 1/34.5)35 7/46 .4/58.3/50.0 
B. Dances or parties with dancing...... 24 .4/28.9.45 5/49. 1/50.064.3/52.8)47.2 
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D. Meetings not connected with the 
a te Ah Doak ie ee ale da ie tise 44. 4/46 .7/47.3'49. 1/21 4:32. 1/47.2/41.7 
B. Meetings connected with the school. .|42.2'40.0/41.850.939.3 46.458.358.3 
F. All other activities, if any.......... |44.4.44.4/47.3 25.5 50.050.0 58 .3.55.6 
G. Above activities as ‘‘dates”......... 31.148.840.056.464.3 75 .0,80.6,80.6 





Also the mean number of all activities in which members of the 
different groups engaged was obtained by dividing the total number 
of activities for a group by the number in the group. ‘“ Dates” were 
not included because of probable duplication. In the high-school 
groups the mean for accelerated boys was 6.5, for non-accelerated boys 
it was 6.4, and for accelerated girls the mean was 6.6 while for non- 
accelerated girls it was 7.0. For university men the mean number of 
activities was found to be 7.1 for the accelerated group and 7.9 for the 
non-accelerated group. The means for both the accelerated and 
non-accelerated groups of university women were 8.9. For activities 
engaged in as ‘‘dates”’ the means for accelerated and non-accelerated 
high-school boys were 0.8 and 1.1, respectively, the corresponding 
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means for high-school girls were 1.8 and 2.1. The mean numbers of 
“‘dates’’ for accelerated and non-accelerated university men were 2.7 
and 3.0, the corresponding means for university women were 4.0 
and 4.1. 

Social Activities of Subjects Expressing Opinions that Elementary- 
school Acceleration Had or Had Not Been a Social Handicap.—Com- 
parisons similar to the above were made for subjects expressing the 
opinions that elementary-school acceleration had or had not been a 
social handicap. The data are given in Table VI. 


TaBLE VI.—PERcENTAGES OF SuBJecTs (THOSE EXPRESSING OPINIONS THAT! 
ELEMENTARY-SCHOOL ACCELERATION Hap orn Hap Nor BEEN a Sociau 
HANDICAP) PARTICIPATING IN CERTAIN SOCIAL ACTIVITIES DURING 
A PEerRiop oF Two WEEKS 
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D. Meetings not con- 
nected with the 
rene 62.5 | 50.0 | 42.9 | 47.2 | 27.2 | 20.0 | 60.0 | 42.9 
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F. Allother activities, 

Re ere 37.5 | 46.4 | 85.7 | 47.2 | 54.5 | 60.0 | 80.0 | 50.0 
G. Above activities 

i En canens 37.5 32.1 | 42.9 41.7 | 81.8 | 53.3 | 40.0 | 89.3 

















The mean number of all social activities participated in was calcu- 
lated. It was found that high-school boys who thought acceleration 
had been a social handicap had a mean of 7.1 and those who thought 
acceleration had not been a social handicap had a mean of 6.9. The 
corresponding figures for high-school girls were 10.6 and 6.8. The 
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mean for university men who thought acceleration had been a handicap 
was 8.6 and for men who thought it had not been a handicap the mean 
was 6.5. The corresponding means for university women were 9.2 
and 9.0. The mean number of “dates” for high-school boys who 
thought acceleration had been a handicap was 0.8 and for boys who 
thought acceleration had not been a handicap the mean was 1.0. 
The corresponding means for high-school girls were 1.0 and 2.3. The 
mean number of ‘‘dates”’ for university men who thought acceleration 
had been a handicap was 4.2 and for men who thought it had not been 
a handicap the mean was 1.9. The corresponding means for university 
women were 2.4 and 4.3. 

Although the numbers of subjects are too small to give reliable 
data, in general there was evidence that those who believed accelera- 
tion to be a social handicap were at least as active socially as those 
who believed acceleration not to be a social handicap. 

Comparison of Personality Schedule Scores and Social Activities of 
Accelerated and Non-accelerated Subjects Matched in Age, Sex, Intelli- 
gence Rating, and School Marks.—In addition to the comparisons 
which have been made of groups matched in sex, grade location, 
intelligence rating, and school marks, another comparison has been 
made in which differently matched pairs have been used. In the 
previous sections accelerated subjects have been compared with non- 
accelerated subjects who were older than they. In the present section 
accelerated subjects are compared with non-accelerated subjects who 
are practically the same age. The matching for the present section 
has been done on four bases: (1) Sex, (2) Chronological age, (3) Intelli- 
gence rating as measured by standardized intelligence tests, (4) School 
marks. The variable is grade location. Subjects were considered 
to be matched in chronological age if their ages were within six months 
of each other. The usual matching was accelerated junior with non- 
accelerated freshman and accelerated senior with non-accelerated 
sophomore. a 

In order to increase the number of pairs to be compared in this 
section a few previously unmatched subjects were used, but most of 
the subjects here included are the same as those used in previous sec- 
tions. However, they are matched in different combinations. In 
matching at the high-school level, subjects from a given school were 
matched with subjects from the same school. Fifteen pairs of high- 
school boys and eighteen pairs of high-school girls, total thirty-three 
pairs; and eight pairs of university men and eight pairs of university 
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women, total sixteen pairs, were secured by the new matching tech- 
nique. Comparisons of personality schedule scores are given in 
Table VII. It will be noted that there are no significant differences 
between accelerated and non-accelerated subjects, and that the number 
of subj.cts in each group is small. . 


TaBLe VII.—Re.iABILITIES OF DIFFERENCES IN PERSONALITY SCHEDULE Scorgs 
OF ACCELERATED AND NON-ACCELERATED SuBJECTS MATCHED IN AGB, 
Sex, INTELLIGENCE RatinG, aNnD ScHoot Marks 





) ' SD | Diff./ 
Enter- | Mean SD | Dil. diff. | SD 
: of | mean 
ing age | score ; mean| of 


dis. | scores scores| diff. 




















Boys 
ee eee 12 | 40.7| 17.5 
High NE ene | avis. | 8} °* | 9.) 9-% 
school 
Girls 
ee 12 | 43.1 | 12.1 
aaa ud, 14 | 40.4] 12.7| 2:8 | 4-18 | 0.68 
Men. 
eee 16 | 37.41 17.1 
Waiver, | WoS.............. ig | 34.61 15.4| 2:8} 8-12 | 0.34 
sity 
Women 
Mt i6 | 45.1 | 19.8 
” RR ig | 46.1| 12.3| 1:9 | 8-25 | 9.12 




















Table VIII indicates the percentage of each group participating 
in certain social activities at least once during a period of two weeks. 
Some of the differences between accelerated and non-accelerated sub- 
jects are of interest, but the numbers of subjects in the various groups 
are too small to give statistically reliable differences. 

The mean number of all social activities participated in during a 
period of two weeks was found to be 8.3 for accelerated high-school 
boys and 6.0 for non-accelerated boys. For accelerated high-school 
girls the mean was 7.3 and for non-accelerated girls it was 6.1. At the 
university level, the mean for accelerated men was 7.6 and for non- 
accelerated men it was 5.9, while for accelerated women the mean was 
12.3 and for non-accelerated women it was 7.0. For activities engaged 
in as ‘‘dates’’ the means for accelerated and non-accelerated high- 
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school boys were 0.7 and 0.5, respectively, the corresponding means for 
high-school girls were 3.3 and 1.1. The mean numbers of ‘“dates”’ 
for accelerated and non-accelerated university men were 2.4 and 2.3, 
respectively, the corresponding means for university women being 
5.3 and 3.6. 

On the whole there was some indication that accelerated subjects 
were more active socially than non-accelerated subjects of their own 


age. 


TasLe VIII.—PrrRcENTAGEs oF Sussects MATCHED IN AGE, SEX, INTELLIGENCE 
RaTING, AND Schoo, Marks PartTIciIPpATING IN CERTAIN SOcIAL 
ACTIVITIES DURING A PERIOD or Two WEEKS 





























High school University 
Boys Girls Men Women 
es wes oa gece haeueee 12 | 14] 12; 14; 16/18} 16) 18 
Number of subjects............... 15 | 15 | 18); 18 8 8 8 8 
A. Parties without dancing:....... 20 .0,33 .3 33 .3.33.3 37 .5)37.5 87.5) 50.0 
B. Dances or parties with dancing. .| 33.3) 6.7|55.6/50.0 50.0,62.5 50.0) 62.5 
C. Theaters or movies............ 100.093.388.983 .3)100.087.5)100.0,100.0 
D. Meetings not connected with the 
Ns. i'n wcie moth ic otal anil 40 .0/'40.0/38.9|44.4) 37.525.0) 75.0| 37.5 
E. Meetings connected with the | 
SNS | Sa'c ob eelan dew ds he.0seuas 40.053 .333.3)77.8 37.5|50.0) 87.5) 50.0 
F. All other activities, if any....... 46 .733.361.1 11.1 50.050.¢ 62.5) 25.0 
G. Above activities as ‘‘dates”’.... 55.508. 7/68 .O88 © 50.0)62.5 75.0) 87.5 











SOME INDIVIDUAL REACTIONS OF SUBJECTS TO THE PROBLEM 
OF SCHOOL ACCELERATION 


Informal explanatory statements were secured from subjects who 
expressed the opinion that elementary-school acceleration had been a 
social handicap to them. Such statements were secured from ten girls 
and four boys at the high-school level and from five women and five 
men at the university level. It was found that the statements could 
be classified roughly under five headings. Space will not be taken to 
reproduce all of these very interesting statements but one typical 
quotation will be given under each heading. 

1. Difficulties Resulting from Home Conditions.—One freshman 
high-school girl stated that although she did the same things in school 
activities as her friends did, 





i 
/ 
: 


LS WSR TS SD 


—s 








538 The Journal of Educational Psychology 


. . « my mother does not think I am old enough to go to parties and dances 
with them. She thinks that I am still a baby and that I am too young to go out. 
Naturally when I see my friends going out I want to go too, but mother always 
reminds me that I am a year younger than they and tells me that I cannot go. 


2. Difficulties in Relationships with the Opposite Sex.—A sophomore 
high-school boy expressed his opinion as follows: 


The skipped person gets to high school more or less of a runt physically. Being 
small handicaps the boy greatly if he has any athletic aspirations and thus lowers 
his self respect. Also a physical runt is looked down upon by members of the 
opposite sex and consequently is excluded from social activities. 


3. Difficulties in General Social Relationships.—Closely related to‘ 
the problem of relationships with the opposite sex is the general prob- 
lem of social relationships with classmates. A university man who 
had been accelerated wrote: 


I was already slightly self-conscious and this advancement made me more so, 
. . - Inhigh school I was just too young and too small to enjoy the extra-curricular 
activities. 

4. Resentment at Being Considered As ‘‘ Different” from Classmates. 
A university woman wrote: 


People always expect me to be at the head in everything. I don’t care if I 
don’t make the best grades, and when I say I’m worried over a test or exam they 
say—‘‘Oh, you don’t have to worry: you'll make A plus.’”’ People think you are 
just naturally different if you skip a grade or two. 


5. Decreasing Difficulties with Maturation.—A senior university 
woman stated: 


Most of the social disadvantages which have resulted from my double promo- 
tions in grammer school came in earlier years. 


No attempt was made to treat these informal statements statisti- 
cally, but such statements as the above are certainly of significant 
interest to the student of personality disturbances. 


SUMMARY AND CONCLUSIONS 


In order to hold constant certain factors which most investigators 
have permitted to exist as variables, the subjects of the present study 
were divided into pairs, each pair being matched on four bases: 
(1) Sex, (2) Grade location, (3) Intelligence rating, (4) School marks. 
Each high-school pair included one student who had entered high 
school at age twelve or younger and one who had entered at age four- 
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teen. At the university level each pair consisted of one student who 
had entered the university at age sixteen or younger and one who had 
entered at age eighteen. Most of the data were secured from two 
hundred high-school and one hundred twenty-eight university stu- 
dents. ‘The Cowan Adolescent Personality Schedule was used to secure 
a measure of the personality adjustment, a short questionnaire was 
used to secure data on number and kinds of social activities partici- 
pated in, and personal interviews and personal letters from subjects 
were used to determine possible causes of maladjustments. Although 
very few statistically significant differences have been found, a number 
of interesting tendencies have been suggested by the data. 

One of the most interesting findings of this research developed in a 
comparison of the personality schedule scores of subjects expressing 
opinions that elementary-school acceleration had or had not been a 
social handicap. It became evident that the scores on the schedule 
were a measure of something related to a feeling of being socially 
handicapped. The scores of subjects believing that acceleration had 
been a social handicap were found to be appreciably higher (more 
maladjusted) than the scores of subjects believing that acceleration 
had not been a social handicap. 

Briefly summed up, the work of this research has suggested the 
following tentative answers to some of the problems pene by 
school acceleration. 

(1) Personality adjustment, as measured by a personality schedule, 
is probably not appreciably affected by the single factor of school 
acceleration. 

(2) Although the data are not statistically reliable, on the whole 
there is a suggestion that accelerated students are not quite so active 
socially as their non-accelerated classmates. 

(3) When compared with students of their own chronological age, 
accelerated students seem to be at least as active socially as non- 
accelerated students. 

(4) Although some accelerated students believe that they have 
been socially handicapped by their acceleration, they seem to be as 
active socially as those accelerated students believing that they have 
not been handicapped. 

(5) For students believing themselves to be socially handicapped 
the difficulty seems to result from a feeling of being different from their 
fellows. This difficulty is often aggravated by parents and others who 
point out and emphasize whatever differences may be present. 














GENERALITY AND SPECIFICITY OF 
CONSERVATISM-RADICALISM 


THEODORE F. LENTZ 
Washington University, St. Louis, Missouri 


May, Hartshorne,! Symonds,? Watson,* Murphy,‘ Allport,® and 
others have submitted data and arguments bearing on the issue of the 
specificity versus the generality of character. The arguments and the 
data lead to the tentative conclusions that 

(a) Specificity is a matter of degree, that is, there is some generality, 
and some specificity about each real trait; where there is no generality 
whatever there can be no trait. 

(b) The amount of specificity or generality varies from trait to 
trait. 

It is further apparent that the measure of specificity or generality 
is found in the intercorrelations among various so-called manifesta- 
tions of the given trait, the lower the r, the greater the specificity. 

It is the purpose of this article to report correlational data within 
one trait, that of conservatism. The intercorrelations constitute the 
measure of relations among six scores representing as many fields or 
phases of this trait. The scores used are based upon the following 
armchair groups of items: Education, religion, government, sex, 
non-social, and general. These groups are illustrated by the following 
sample items in the order named: 


1. Our schools and colleges should devote twice as much attention to the 
development of the artistic taste. 

2. Telling a lie is worse than taking the name of God in vain. 

3. Voters should disregard the party and vote for the man. 

4. Women in general are not as intelligent as men. 

5. There is no probability that the artificial production of milk or milk 
substitute will do away with the cow. 

6. Much more energy should be expended in conserving what mankind 
does know than in discovering what it does not know. 


There were one hundred ninety items in the six groups. No item 
appeared in more than one group. The data here reported are based 
on the reactions* of five hundred seventy-nine college students to these 





* The subjects were required to indicate agreement or disagreement to each 
statement item. The score is the sum of all the conservative statements, as 4, 5, 
and 6 above, agreed to, plus the radical statements, as 1, 2, and 3 above, disagreed 
to. 
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one hundred ninety items when presented to them in a more inclusive 
list of four hundred thirty-seven items, all designed to measure 
conservatism. They were presented in two forms, known as experi- 
mental forms Z and F—forerunners of the present forms J and K 
of the C-R Opinionaire, developed by the writer and colleagues of the 
Character Research Institute of Washington University. The number 
of items in each group and the reliability of the scores as determined by 
the split-half Spearman-Brown correction method are to be found in 











Table I. 
TaBLeE I 
Number Split-half r Probable Corrected | Number 
of Field ccetieaiehaill pens for whole of 
items test cases 
46 BUORtIOR.. .. occ cee .365 .024 .535 579 
30 Ns onisud sides .618 .017 . 764 579 
34 Government......... .541 .019 .702 579 
39 Ck ga Re pea .390 .024 .561 579 
22 Non-social.......\.... .453 .022 .624 579 
20 sh Se .387 .024 .558 579 
ie a a ere bgt a a ts .421 median) .024 median) .592 median 




















. . \ . ® 
Crude inter-correlations among these six scores are presented in 











Table II. 

TABLE II 

Religion Govern- Non , |General! Sex 

ment social 

ee og ee a alee 341 .459 .436 . 258 .402 
Rs Le a US vos ee x .438 .449 .503 .515 
Rs 5 ck a ad | ale \. | .472 .433 .435 
ET Gaeta aR, .473 .461 
cas Co el ek ie ee | .491 

















Probable errors for these fifteen correlations range from .020 to 
025. By the formula r,4 = 712/+/rire: these correlations were cor- 
rected for attentuation due to unreliability of the separate tests. 
These corrected correlations appear in Table III. 
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TaB._eE III 





Religion Govern-| Nom- | General | Sex 
ment social 





———., 


I cans vth ioe viacds veads .532 .749 . 754 .473 . 734 
EP Oe ee re Kite .598 .551 .770 . 786 
ie ie alas del ct® a se ees .713 .692 .694 
GTS re ee Pas eae te . 802 .779 
CE hed ie is ke bs as aeha mes alan See eng eye .878 




















It will be noted that the median of these fifteen corrected correla- 
tion coefficients is .73. While actual correlation coefficients have 
seldom been presented, the general impression from the literature is 
that the different fields of conservatism have much less in common than 
is indicated by this figure of .73. As reported by Murphy,® a study 
by George’ is one study which argues most strongly for generality and 
yields an intercorrelation of .55 between liberalism, international and 
national. Apparently, nothing was said about the reliability or 
unreliability of the two measures. It is the contention of the writer 
that uncorrected intercorrelations between groups of conservatism 
items are not sufficiently accurate measures of the degree of generality 
or communality involved. Low intercorrelations between the separate 
fields of the trait of conservatism-radicalism may be due to the reli- 
ability of the socalled specific measure, as well as to the specificity of 
the continuum being measured. The way in which the trait is 
conceived or measured may affect either cause. As Allport points out, 
the failure of certain traits to reveal an appreciable amount of general- 
ity is due to the faulty conception of the trait itself. The improve- 
ment in the conception and measuring of the different traits should 
affect both the degree of specificity and the degree of reliability. 

With regard to the factor of unreliability, Tables I, II, and III 
tell the story. In Table II, the intercorrelations appear very low, 
with a median of .42. However, in Table I is to be found a partial 
explanation in the low reliability of the batteries of these separate 
conservatisms. Here we find a median correlation of approximately 
.60. Roughly speaking, one might say that the discrepancy between 
.42 and .60 may be taken as an indication of the amount of the factor 
of specificity among the separate conservatisms. Were the two corre- 
lations equal, we might conclude that the separate tests were measuring 
one and the same thing. 
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Occasionally subjects object to being measured for conservatism 
on the grounds that they are ‘‘conservative in some fields of thought, 
and radical in others.”” This surface argument against the generality 
of conservatism ignores the fact that this phenomenon may be true 
not only between two items representing separate fields, but also 
between two items within the same field. Low correlation between a 
religious item and a political item has to be considered in connection 
with low correlation between two religious items or between two 
political items. Low correlation between individual items can be used 
as argument for specificity only in so far as one can argue specificity 
with regard to almost any psychological continuum, such as spelling 
ability, general intelligence, or vocabulary. 

As an illustration of this point, the writer took twelve words from 
a weekly spelling test used in the upper grades. These words were 
chosen at random, except that an effort was made to assemble the 
words with a high balance, that is, with as large a percentage of persons 
passing as failing. The average of these sixty-six intercorrelations 
among the twelve words was .19 (by the four-fold formula), but the 
correlation between the odd six and the even six was .65. A com- 
parable set of forty-eight words then should correlate .93 with another 
comparable set of forty-eight words. In other words, the principle by 
which we predict reliabilities and by which we build a reliable test; 
namely, the extension of sample, operates in the direction of generality. 
That is, to support the theory of generality, it is only necessary to 
summate the reactions to a number of specifics. This is true whenever 
the initial correlations are appreciably above zero. It simply means 
that where intercorrelation or generality exists at all, it can be increased 
by increasing the sample. Had we increased the number of specifics 
within the groups of Table I, we could have considerably increased 
the median intercorrelation among the various groups of conservatism 
items. According to Table III, the theoretical limit of this increase 
would have been to .734, providing we added no new element to 
(or changed the proportions in) what we had already in the initial 
batteries. Specific is a term which applies to items, in contrast to 
each other, much more than to groups of items. 

This principle of generality being achieved by the summation of 
specific items can be illustrated from another project® under the 
writer’s direction in attitude measurement. Two hundred question- 
naire-opinionaire items were selected for the measurement of minority- 
mindedness, or atypicality. This battery showed a reliability of .89. 
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The fifty best items out of this battery, when treated as separate items, 
showed an average paired item intercorrelation of .19 by the tetra- 
choric formula (.11 by the four-fold formula). Each item appears to 
measure something specific. That is, in each pair the items showed a 
very small amount of communality, or over-lapping. When taken 
as a whole, however, this group of items seems to measure something 
very general, not to say unitary. The comparatively low correlation 
of .11 or .19 for the item pair is due to two factors, as suggested above: 

















TaBLe IV 
Relation to 
Item G 
overn- = 
siaeidh Religion 
(a) The ballot should not be extended below the age of twenty- 
Re ae ete ies pueek a sales Hed ok ws 19 17 
(6) In Sunday School, chiefly the Bible should be taught..... 37 68 
(c) Children should be encouraged to choose, independently of 
their parents, their own religion...................... 13 21 
(d) The presidential term of office, four years, is as it should be 35 30 





(a) The unreliability of a single item as a measure of whatever the 
items measure, and (b) the specificity of the items in contrast to each 
other. 

Making liberal allowance, however, for the unreliability of a single 
item, there is still left a large amount of variance to be accounted for 
by the term specificity. The writer’s experience with items of this 
sort in a previous study,® as well as that of other investigators on 
retests material,'!® tends to show a reliability of .60 for single items when 
measured by the retest method. Applying the formula used for 
converting Table II into Table III, we would, therefore, divide the 
above .11 or .19 by .6. This would result in corrected correlations of 
.18 or .30, respectively. This very low corrected intercorrelation 
argues strongly for the specificity of what is measured by a single item. 
However, the specificity of a single item cannot be used as an argument 
for the specificity of a trait such as conservatism. Probably no 
item can be found which measures conservatism and conservatism 
alone. One reaction to a single item proposition undoubtedly would 
be a function o: conservatism plus other phases of personality. 
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Something of the way in which generality among items behaves 
can be had from observing the statistical relationship between items 
and the fields to which they do and do not belong. To illustrate the 
process, there is presented below a table showing two items taken from 
the battery for the measurement of government conservatism 
and two items from the battery for the measurement of religious 
conservatism. 

The figures in the columns to the right, under relation to govern- 
ment and religion, represent the U-L difference values, 7.e., a crude 
index of relationship with the two batteries of government and religious 
items respectively. The figure “19” following item (a) means that 
the percentage of the most governmentally conservative one-third of 
the population responding conservatively to this item was nineteen 
greater than the percentage of the least governmentally conservative 
one-third. Likewise, the ‘‘17” represents the difference in the per- 
centages of the upper and lower thirds for religious conservatism 
reacting conservatively to the item (a). 

While item (a) appears logically as a governmental item and (b) 
appears as a religious item, the latter shows a higher correlation with 
the government battery than does the former; even though it is true 
that the latter shows likewise a much higher relationship to religion. 
Conversely, although item (d) is logically a government item, it shows 
a higher correlation than does (c) with religious conservatism. It 
would seem that (6) and (c) are better measures of general conservatism 
and correlate more highly with either field of conservatism, including 
the field in which they apparently do not logically belong. 


SUMMARY 


Reaction to specific situations, involving conservative-radical 
issues, may be highly uncorrelated with each other. These specific 
attitudes, however, scarcely merit the term conservatism. The 
median of .73 here reported argues strongly for the validity of the 
concept of general conservatism. There is nothing in the present data 
that precludes the possibility of the development of tests, which, to a 
satisfactory degree, would measure different phases of conservatism. 
Items for measuring such phases, however, will probably need to be 
grouped by some psychological or statistical process and not by the 
logical process used in the classification of items in this study. Proba- 
bly factor analysis applied to a large number of items would reveal 
highly unrelated groups of items for measuring conservatism. 
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A RAPID METHOD OF SELECTING TEST ITEMS 


M. W. RICHARDSON AND DOROTHY C. ADKINS 
The University of Chicago 


In the effort to improve tests, various ways of analyzing the test by 
items have been proposed. The procedures fall into three types: 
(1) those which utilize the correlation coefficients (or substitutes) 
between the individual items and the total test score; (2) those which 
utilize the correlation coefficients (or substitutes) between the indi- 
vidual items and an external criterion; (3) those which utilize the net 
correlations between the individual items and an external criterion. 

Procedures of the first type may be used when it is desired to make 
a test more homogeneous and to increase the reliability per unit 
length. Numerous methods falling under this type have been pro- 
posed. Most of them share the short-comings of the simple item-test 
coefficient and lack the algebraic convenience of the latter. The 
application of such methods to a test tends to conserve the items most 
highly correlated with the centroid or average of whatever the test 
measures. As Mosier has shown,’ these techniques offer special 
difficulty when the original test happens to contain two or more fac- 
tors. In any event, the usefulness of such methods is limited., 

Procedures of the second type may be regarded as approximations 
to the third type. Theoretically, the weight to be attached to an item 
is its regression weight, as computed by the methods of multiple corre- 
lation. Actually, of course, items would be weighted unity or zero, 
i.e., be accepted or rejected, by dividing such regression weights into 
two categories in some more or less arbitrary fashion. Regardless of 
what procedure is used, the item indices are subject to sampling 
fluctuations which possibly are of such magnitude that the choice 
among methods is practically a matter of no importance. 

Perhaps the earliest of metheds utilizing net correlation between 
item and criterion is a method of building successive composites of 
items, known as the L-Method. This method was devised by Toops.® 
The L-Method is really an approximation to the “‘ideal” procedure of 
examining, for n available items, each of the (2* — 1) possible scales 
of all possible scale-lengths (1-item scales, 2-item scales, . . . , n-item 
scales) to find that combination of m items which yields the highest 
correlation with the criterion when gross scores are added. It does 


not require the actual computation of the inter-item correlation 
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coefficients but nevertheless takes them into account. Since the 
calculations cease when the increments to the composite correlation 
become negligible (or negative), this method is to be regarded as a 
“‘build-up”’ process. 

The L-Method operates (by formula) as follows: The variable which 
has the highest individual Pearson correlation coefficient with the 
criterion is chosen as the first one for inclusion. To this one of the 
n variables is added (by formula) each of the remaining (n — 1) vari- 
ables in turn; that one which yields the highest composite correlation 
is selected as the second variable. ‘Then to the two selected variables 
is added each of the remaining (n — 2) in turn, that one yielding the 
highest correlation being selected for the next addition to the com- 
posite; and so on. The process may be discontinued at the stage 
where the composite correlation ceases to rise appreciably. 

The formula involved is conveniently expressed as follows: 








ry c4v) = Lye + Lyv ‘ (1) 
C0 VJ Lrr V Lee + 2Lev + Low 
where C refers to the chosen composite (or C = X, + X2+ -°---+ 


Xm-1 + Xm, if m is the number of items chosen), U refers to each 
unselected item in turn, Y refers to the criterion; and 


Lyv = NZYU — ZTYZTU = NEY — RIY (2) 


in which R& refers to the number of persons passing item U, and ZY. 
refers to the summated criterion scores of those passing item U; 


Lov = NXU? — (ZU)? = NR — R? = RW (3) 
where W refers to the number of persons failing item U; 
Lyy = NZY? — (ZY)?; (4) 
Lye ” Lyx, + Lyx, toc + Lyx,., + Lyx,, (5) 
wherein an individual L is defined as in Equation (3); 
Lee = Lx,x, + Lx,x, + +++ + Dxx,., + Dxx. + 
2(Lx,x, + Lx,x, + + > Lx,_.x0); (6) 


wherein the individual L’s are defined as 


Lx,x, = NR, - R;? = RiW,, (7) 


and 


Lx,x, = NZX1X2 — Rik, (8) 
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in which the term 2X,X; is simply the number of persons succeeding 
with both items 1 and 2. 


At each selection stage, then, each as-yet-unselected item is tried 
out as item U in Formula (1) and that which yields the highest 


ry(c4u) 18 retained. we 


Another method, based on similar general principles, is Horst’s 
method of Successive Residuals.* 

Both methods are laborious. For all populations and tests to 
which either of the methods is applicable, punched card computing 
techniques must be used. Even then the amount of computational 
labor is enormous, particularly if the appropriate checks are applied. 
After the Hollerith work is completed, the L-Method, for a one hun- 
dred fifty-item test, requires approximately eight hours per item 
selected.? 

In an effort to reduce the computational labor to an amount possi- 
ble in the ordinary test analysis situation, a simple approximation to 
the theoretically better methods was adapted from well-known cor- 
relational procedures. 

If Y is the criterion variable, U is any item, and X is the test 
variable, the formula for the prediction of the criterion from the best 


weighted sum of the item and the test may be written, in terms of 
deviate scores 


Tyv — Txufxy tu Z— XZ i 








(9) 


} 
Oy Txy — TyuTxu Gu Oxvu 
In this formula, ryv is the correlation between the item and the 


criterion; rxv is the correlation between the item and the test; rxy is 
the correlation between the criterion and the test. 


Tru — TxvuTxy 
(rxy ar TyuTxv)ov 
equation (9). Obviously cx_y is practically ox. Likewise cy is not 
an attribute of the item U and need not be considered. The equation 


We will consider only the weight 





of the item U in 





* No comparisons with this method will be presented in this paper. The 
interested reader will find the procedure described in references ' and *. Horst 
has devised another method in which he uses a maximizing function.’ The 
Method of Successive Residuals is so closely cognate in theory to the L-Method 
(although the computational detail seems quite different) that the comparison of 
any other method with the L-Method probably will hold fairly closely for the 
Method of Successive Residuals. 

t Reference *, Formula 344. 





: 
{ 
: 
' 





i 
y 








Le 
18 
rN 


Sa 


ve 





550 The Journal of Educational Psychology 


gives a weight of unity to the test less the item under consideration. 
It can be assumed that any one item is negligible with respect to the 
test. The coefficient ryv is, of course, different for each item, and does 
not depend on the length or composition of the test. The coefficient 
rxv is different for each item and will vary with the length of the test 
when, and only when, the centroid of the test is shifted. We shall 
assume that, in a situation in which a minority of items is rejected, 
the position of the centroid is only slightly affected by the elimination 
of items. This assumes that the inclusion or exclusion of any item U 
from the test scores does not appreciably alter the value of rxv when 
the number of items is large; t.e., that rxv is approximately equal fo 
r(x-vyv. This assumption could do little violence. 

The next assumption should present the greatest difficulty, from 
an a priori viewpoint. It is now to be assumed that rxry is constant 
for the various test lengths produced by the elimination of items. 
This assumption presents the unique difficulty of regarding as constant 
the selfsame quantity which it is hoped will vary; after all, the objec- 
tive of the method is to raise the magnitude of the correlation between 
the test and the criterion. Since the (apparent) validity coefficient 
increases rapidly with the first few items and does not increase rapidly 
thereafter, it may be considered that the assumption of rxy as constant 
would be more justifiable if the method were applied to the elimination 
of a minor percentage of the worst items. However, it will later 
appear that the empirical comparison of the method with Toops’ more 
rigorous L-Method is made on the selection of a minor proportion of 
the best items. Such a comparison gives the method a severe test.* 

The approximation method was applied to the Ohio State Univer- 
sity Psychological Examination, Form 18, already analyzed by the 
L-Method, for a population of eight hundred. The criterion was a 
measure of scholarship in the first college year. The item weights, 

Tru — TxuTxy 
(rxy - Tyu?xuv)ov’ 
items. 

The item values and their rank order for the ‘‘ better” items by the 
short method are presented in Table I for comparison with the fifteen 
items selected by the L-Method after one hundred twenty hours of 
computation. | 





were calculated for each of the one hundred fifty 





* The L-Method works from either end; i.e., it may be used successively to 
reject the poorest items as well as to select the best items. The data provided 
for comparison involved successive selection of the best items. 
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Tasie I 
Short method L-Method 
Item no. 
Item value Rank order Order of selection 

149 .630 1 1 
147 .453 2 ad 
130 .421 3 6 
51 .385 4 ° 
150 . 380 5 12 
74 .376 6 2 
39 .356 7 10 
1 .346 8 a 
55 .329 9 5 
18 .323 10 s 
72 .321 ll 14 
44 .318 12 13 
122 . 258 13 7 
135 . 252 14 ll 
95 . 249 15 i) 
58 .248 16 4 

63 226 17 
57 . 209 18 ° 
77 . 207 19 15 
66 . 203 20 ° 














* After the fifteenth item had been selected by the L-Method, twenty-four 
additional items with positive increments to the validity of the composite were 
retained without attempting to arrange them in order of selection. Such arrange- 
ment would have required approximately one hundred ninety-two additional 
hours of time using the L-Method. The items starred were included in this group 
of twenty-four items. 


It may be seen by simple inspection of the table that the short 
method selected twelve of the fifteen items laboriously computed to 
be best by the L-Method, and that fourteen of the L-Method items 
were in the first twenty, as estimated by the short method. The 
third item selected by the L-Method is the one item unaccounted for 
in Table I.* The L-Method gave fifteen items properly ranked; 
then twenty-four items with positive residuals at the fifteenth stage 





* This item has a high correlation with the test (rxv) by virtue of a high average 
correlation with the remaining one hundred forty-nine items. Hence the short- 
method index was low, ranking eighty-second. On the other hand, its average 
correlation with the items previously selected by the L-Method was relatively low, 
so that it ranked high by this method. 
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were tabulated. Of the total thirty-nine items, twenty-five were 
included in the first thirty by the short method, and twenty-seven in 
the first thirty-nine. 

Since the computational time, given the two correlation coefficients 
and the percentage of correct responses for each item, does not exceed 
ten minutes per item by the short method, this ten minutes is to be 
compared with eight hours per item by the L-Method.* 

The great saving in computational labor has evidently produced no 
substantially poorer selection of items, as indicated by the high propor- 
tion of common items. This is further indicated by the reliability 
coefficients, which are .70 and .71, respectively, for the short method’ 
and the L-Method. The reliability coefficients were computed by one 
of the methods recommended by Kuder and Richardson,* Formula 8. 


This study suggests that a very simple approximation to a multiple 
correlation procedure will give results substantially as good as those 
obtained from a theoretically closer approximation. 
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* This seems to be an appropriate place to state that the second author named 
was responsible for the work done by the L-Method; and the first author named, 
for the short method. 
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BOOK REVIEWS 


E. G. Wituiamson, and J. G. Darury. Student Personnel Work. 
New York: 1937, The McGraw-Hill Book Company, 1937, pp. 
xxiv + 313. 


The essential impracticability of any thoroughgoing treatment of 
guidance at the college level, such as is presented in this very good 
book, is that it assumes an educational philosophy quite at variance 
with current practice. The more realistic counsellors are aware of 
this, and attempt to make their position stronger by defining “ per- 
sonnel work” as an attempt ‘“‘to deliver the student to the classroom 
in the optimum condition for profiting from instruction.”’ This seems 
to the reviewer to be an empirical and unjustified delimitation. Guid- 
ance is much more than that, but it presupposes a degree of educational 
individualization that has not as yet been approximated. Of course, 
we should teach individuals after considering their interests and apti- 
tudes and needs, but we do not. It is even doubtful if such is the 
case at the great University of Minnesota. The fact that but one 
student in ten is serviced by the University Testing Bureau implies 
as much. + 

Another characteristic type of optimism which is illustrated in the 
writings of guidance specialists is the scope they give to formal educa- 
tion. Williamson and Darley, for example, imply that education 
should ‘‘provide . . . training not only in the realm of occupational 
activity but also in the total realm of life adjustments . . . ” (p. 19). 
This is rather broad to put it mildly. Schools, higher or lower, are 
not equipped to do everything, and this reviewer for one hopes the 
day may never come when formal educational institutions will attempt 
to provide training in the “‘total realm of life adjustments.’’ There 
are, after all, other social institutions. The family has not yet dis- 
appeared completely—nor has the church, nor have the Boy Scouts 
of America. Let them do something. We educators have not done 
so well in our efforts to stimulate the intellectual growth of young 
people that we would be justified in attempting everything else. 

The above comments cast no unique reflection on the present 
book. Williamson and Darley are at their best when they describe 
the excellent program at Minnesota, and this takes up the last two- 
thirds of the volume. Preceding this description there are chapters 
dealing with ‘‘ American Education,” ‘‘ Achieving Individualization in 
Education,” and ‘‘Surveys of Personnel Work.” In their survey the 
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authors first develop their concept of the scope of personnel work, and 
then summarize briefly what has been done with respect to the predic- 
tion of student achievement, freshman orientation, testing, curricular 
readjustment, counseling, vocational information, placement, and 
the ‘‘codrdination of personnel work versus the centralization of per- 
sonnel work.’’ Codérdination is recommended. The treatment of 
‘“‘ Analytic Techniques in Counseling”’ is superior and consists chiefly 
of a description and criticism of the many types of tests, inventories, 
and questionnaires which have developed primarily for use with college 
students. 

A great deal of emphasis throughout the book is placed upon the 
clinical approach in guidance—“ ... our primary thesis is that 
clinical techniques and facilities are the heart of personnel work—’’ 
(p. 81). As the authors contend, insistence upon a clinical procedure 
goes far to insure that guidance will include more than the administra- 
tion of numerous tests. Tests are necessary, but the most significant 
work of a guidance expert comes after the tests have been given and 
scored. 

The chapters dealing with “‘Student Problems and Treatment” and 
‘“‘Tllustrative Case Histories”’ are valid, interesting, and informative. 
A frank evaluation of the work of the Minnesota Testing Bureau con- 
cludes the volume. This aspect of most guidance programs has been 
woefully neglected, and yet without evaluation personnel work is apt 
to degenerate into wishful thinking. Williamson and Darley base 
their conclusions regarding the value of the work done by the Minne- 
sota Bureau upon an intensive study of one hundred ninety-six cases, 
and while they warn the reader against undue optimism—“ careless 
generalization of the results—would be fallacious, or at least prema- 
ture’”’ (p. 270)—the evidence presented is impressive. 


STEPHEN M. Corey. 
University of Wisconsin. 


E. G. Wruuiamson. Students and Occupations. New York: Henry 
Holt & Co., 1937, pp. 437. 


Courses in occupational information on the college level are not as 
numerous as they might be, nor have there been many volumes pub- 
lished suitable as textbooks for such courses. Therefore, this volume 
is a welcome contribution to the literature in this area. The author 
informs us that his book is the outgrowth of teaching a course in 
occupations at the General College, University of Minnesota. 
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The first three chapters of the volume deal with: ‘“‘ What Colleges 
Can Do for Students,” “The Making of a Vocational Choice,” ‘‘The 
World of Workers.”” The remaining seventeen chapters cover specific 
occupations or families of occupations. 

The first chapter is a clear cut exposition of the purpose of a college 
education. This is the type of information that should percolate down 
to the senior in high school. It is exceptionally good material for 
vocational and educational counselors in secondary schools. The 
chapter on vocational choice discusses the methods of choosing a 
vocation that are more or less familiar to trained vocational counselors. 
Two cases illustrate the ‘clinical method,’ which the author, in 
collaboration with J. G. Darley, has treated in their book, Student 
Personnel Work. Chapter III serves as the occupational spring-board. 
It surveys the general types of occupations and provides the student 
with factual material regarding occupational trends. 

In the section of the book devoted to information about specific 
occupations there are, in addition to the more traditional occupations, 
chapters on the following: Forestry, social welfare, library, occupations 
in art, and skilled workers in industrial occupations. It is extremely 
wholesome that information about skilled workers should appear in an 
occupational textbook for college students. It supplies a note of 
reality, because many college students may find themselves in indus- 
trial occupations even though they aspire to white collar or professional 
pursuits. | 

The book is interestingly written and should appeal to college 
students. It is well documented with factual material of recent pub- 
lication. Though the author lays no claim to completeness regarding 
references and suggested readings at the close of each chapter, the 
reviewer notes that much good material on specific occupations has 
been omitted. Roy N. ANDERSON. 

Ohio State University. 


VERNON W. Grant. Psycholdgical Optics. Chicago: The Profes- 
sional Press, 1938, pp. 240. 


This book is essentially a simplified practical treatise on perception 
in the field of vision. After an orientation in the fundamentals of 
reactive behavior, the author discusses the visual reactive system and 
the various aspects of visual experience including illusions. Appar- 
ently the author aimed for a generalized organization of materials 
rather than a documented analysis of evidence in the field. Conse- 
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quently, the text may be characterized as elementary and introductory 
rather than exhaustive. It is designed for students and practical 
workers in the field. 

Visual reactions are regarded as adaptive acts. Although the book 
is written in terms of response psychology, adequate consideration is 
given to conscious impressions. Much is made of attention in relation 
to visual experience. Notice is given to such concepts as form con- 
stancy, size constancy, and figure and ground without introducing 
systematic Gestalt interpretations. A list of one hundred sixty-nine 
references is appended. 

Further amplification of certain materials would foster greater 
clearness of exposition. This could be achieved by a more liberal 
citation of experimental evidence in support of generalizations. 

Certain specific criticisms suggest themselves: (1) For eye-move- 
ment time in reading, one-thirteenth is a better mean figure than one- 
tenth. (2) Recent literature suggests a more frequent occurrence of 
congenital word blindness (non-readers) than one in two thousand. 
(3) Many will disagree with the statement that ‘‘of the three outstand- 
ing general theories of color vision . . . that of Ladd-Franklin seems 
to have been least weakened by criticism.’”’ (4) Discussion of color 
blindness is quite inadequate. Furthermore, the recent literature 
indicates a greater frequency of red-green blindness than the figures 
cited. (5) The definition of visual illusions is not psychological. An 
image displaced by refraction of light rays is not an illusion. 

The book is clearly written and for the most part the material is 
interestingly presented. It is the first extensive treatment of visual 
perception on the elementary level. As such it will be welcomed by 
certain students and workers in the field of vision. 


Mies A. TINKER. 
University of Minnesota. 


JoserpH K. Hart. Mind in Transition. New York: Covici-Friede, 
1938, pp. 413. 


The major point developed in this book is that the confusion of 
the world today is-due to the fact that in past years, nature and the 
social order of things have undergone marked evolutionary changes 
but the human mind has remained primitive and unchanging and 
consequently ineffective in dealing with the present. ‘‘ Primitive 
men, living in a world of gross uncertainties and surrounded by all the 
terrors of the unknown, built for themselves such stable organizations 
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of group life and control over their environment as they could, 
assumed that these arrangements were final, and let their minds ‘set’ 
in the patterns thus accepted. That primitive mind has persisted 
ever since, down to the present, and it still sets the pattern of life and 
belief for the great majority of men.”’ The author further states that 
the age-old ways of acting and thinking have become part of our 
neural patternings so that it is difficult to free ourselves from them. 
However, the salvation of man from recurring wars, depressions, and 
conflicts depends upon the reconstruction of this primitive mind so 
that it will be more scientific and progressive and in step with the 
endless change of nature. 

Dr. Hart presents his point well, though at times his arguments 
are somewhat repetitious. Many psychologists will object to his 
Jungian-like reasoning that there are deep, unconscious, inherited 
layers of mind that are reservoirs of archaic ideas and thoughts. 

James D. PaGeE. 
The University of Rochester. 


Henry W. Simon. Preface to Teaching. New York: Oxford Univer- 
sity Press, 1938, pp. 98. 


Apparently the policy initiated by Dean James E. Russell, of 
having all kinds of educational opinion represented by the staff of 
Teachers College, is being continued by its present Dean. One gets a 
slight shock as one reads the title of the first chapter of this admirable 
work: ‘“‘Why the Teacher Cannot Reform the World,” and remembers 
such Teachers College contributions as ‘‘ Dare the School Build a New 
Social Order?”’ Simon is obviously the antidote to Counts, as Bagley 
was to Kilpatrick. 

This book, which is scarcely larger than a pamphlet, is the best 
introduction to teaching that the reviewer has ever read. Consisting 
of two sections—‘‘ What the Job is”’ and “‘ How to do it’’—it is written 
in short, vigorous English words reminding one forcibly of the style of 
the Bible. It is full of horse sense. Placed in the hands of those 
entering upon the profession of teaching, it should give them a proper 
perspective. It would be especially valuable in the teacher-training 
institutions that are mostly manned by Progressive professors, but the 
reviewer is afraid that it is just those institutions that will place 
it on the Index. Nevertheless, such a courageous work will have a 
deservedly wide sale. PETER SANDIFORD. 

University of Toronto. 
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